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EFFECTS OF DIRECTIONS REGARDING 
GUESSING ON ITEM STATISTICS OF A 
MULFIPLE-CHOICE VOCABULARY TEST 


FRANCES SWINEFORD and PETER M. MILLER 


Educational Testing Service 
Princeton, N. J. 


When an examinee meets a test item which is totally unfamiliar 
to him, he may omit the item or he may make a wild guess. 
There is no such thing for him as a ‘shrewd guess’ when the item is 
completely divorced from his previous experience. If only an 
occasional examinee finds an occasional item of this nature, 
nothing he does can seriously affect either his score or the descrip- 
tive statistics of the test or items. If, on the other hand, a test 
were to contain a substantial number of items totally unfamiliar 
to a large proportion of the examinees, there might be a noticeable 
effect on test scores, test statistics, and item statistics. One of the 
aims of this study is to investigate such a situation. 

It is a further purpose of this study to investigate the amount of 
guessing that is likely to occur under different instructions to the 
examinee, to find what relationship may exist between amount of 
guessing and performance in the area covered by the test, and to 
determine the effects of guessing on various statistics. ; 

In order to accomplish these purposes, a 100-item vocabulary 
test was constructed, containing eighty regular items of appro- 
priate difficulty for the group to be tested, ten extremely diffi- 
cult items, and ten nonsense items. The extremely difficult 
items contained stem words which appear in Webster’s New 
Collegiate Dictionary but which were unfamiliar to the writers and 
to their colleagues to whom the list of words was submitted; it is 
unlikely that any of these words would be familiar to the average 
college graduate. The nonsense items, which were ‘scored’ 
by an arbitrary, randomly devised key, contained stem ‘words’ 
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which are not words at all. The items were not arranged in order 
of difficulty. The twénty special items were distributed at 
random throughout the test. They are listed below: 


Special Items 
Difficult items: 


11. MURICATE: 1-prickly 2-closed 3-purple 4-emblazoned 


5-enfeebled 
20. NUBILE: 1-marriageable 2-black 3-oily 4-predatory 


5-removable 

30. CONDIGN: l-elevated 2-deserved 3-taciturn 4-fabricate 
5-supplicate 

34. HIRCINE: l-epicene 2-placid 3-multifarious 4-goatlike 
5-enslaved 


53. LANNER: l-rope 2-specialty 3-wrench 4-faleon 5-schooner 

57. DECORTICATE: 1l-husk 2-testify 3-swear 4-cleanse 
5-affirm 

59. GESTION: l-animadversion 2-hint 3-altercation 4-humor 
5-conduct 

76. SAPID: 1-ailing 2-innocuous 3-palatable 4-uninformed 
5-aged 

78. JINK: l-coin 2-dodge 3-crack 4-dance 5-jot 

84. FAINAIGUE: l-rumpus 2-mausoleum 3-cheat 4-invalidate 


5-conspiracy 
Nonsense items: 


17. QUINTULENT: 1-feverish 2-faltering 3-acrid 4-prurient 
5-decaying 

27. TAMORIN: 1-shrew 2-drum 3-woodpecker 4-urn 5-associate 

38. ARDICIAN: l-handyman 2-suitor 3-barber 4-geologist 
5-undertaker 

44, PALIENT: /-dim 2-important 3-sharp 4-twinkling 5-friendly 

50. VENTRESCULATION: l-wound 2-eloquence 3-latticework 
4-window 5-profanity 

60. HILN: l-cupboard 2-handle 3-meadow 4-outhouse 5-relation 

65. RHUSTATE: 1-inflamed 2-stopped up 3-frustrated 4-reddish 
5-ill-bred 

69. BRUNNAGE: 1-fog 2-anger 3-rigging 4-darkness 4-effrontery 

72. SUSCERN: 1-see 2-dissociate 3-suspect 4-worry 5-repudiate 

88. WALDER: l-meander 2-lancer 3-renege 4-mason 6-mend 


Note: The most popular responses to the nonsense items are as follows: 
17(2); 27(4); 38(1); 44(1); 50(2); 60(2); 65(4); 69(5); 72(3); 88(1). 
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The test was administered in three forms, which differed only in 
the cover-page instructions. Following detailed directions for 
proper use of the separate answer sheet, directions which were the 
same on all forms, the remaining directions were as follows: 


Form 1 
It is not expected that everyone will finish in the time allowed. Do 
not hurry, but work steadily and as quickly as you can without 
sacrificing accuracy. 
You will‘ be given 30 minutes to work on this test. 

Form 2 
It is not expected that everyone will finish in the time allowed. Do 
not hurry, but work steadily and as quickly as you can without 
sacrificing accuracy. 
You are advised to answer a question only if you are sure of the 
answer. You should not guess since wrong answers will result in a 
subtraction from the number of your correct answers. 
You will be given 30 minutes to work on this test. 

Form 8 
It is not expected that everyone will finish in tle time allowed. Do 
not hurry, but work steadily and as quickly as you can without 
sacrificing accuracy. 
Answer all questions about which you have any knowledge. You 
are advised, further, to make a guess on unfamiliar words: a shrewd 
guess is more often right than wrong. Your score on this test will 
be based on the number of your correct answers; no deduction will 
be made for wrong guesses. 
You will be given 30 minutes to work on this test. 


The three forms were distributed in such a way that every third 
examinee received the same form. No mention was made of the 
fact that there were different forms, and it is unlikely that the 
examinees were aware of that fact before they had an opportunity, 
later, for discussions with one another. This method of dis- 
tributing the forms among as many as eight hundred examinees 
virtually assures samples that are equivalent for all practical 
purposes. We shall be interested only in experimental differ- 
ences which are significantly greater than the sampling differ- 
ences that are likely to arise. 

When this study was proposed, three hypotheses were set down; 


namely, 
1) There will occur some guessing on the part of all three 


groups. 
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2) The amount of guessing will vary only slightly with differ- 
ences in instructions. 

3) There is little or no relationship, either positive or negative, 
between ability and the tendency to guess. 

No hypotheses were offered concerning the test statistics. 
These will be examined, however. 

Seven scores were obtained on each paper. They are: 

Score (R) on eighty regular items. 

Score (R) on ten difficult items. 

‘Score’ (R) on ten nonsense items. 

Total score (A + B+ OC). 

Number of responses to ten difficult items. 
Number of responses to ten nonsense items. 

G. Total number of responses to special items (E + F). 

There were 267 cases who took each form, 801 cases in all. In 
Table 1 are listed the means and standard deviations of the seven 
scores. ! 

The means for Score A (eighty regular items) are remarkably 
similar. Although the mean for Form 2 (do not guess) is the 
smallest and that for Form 3 (try all items) is the largest, as 
would be expected from the nature of the directions, their differ- 
ence of less than two score points is both unimportant practically 
and no greater than one which might reasonably occur by chance. 

Taking into consideration the number of difficult items at- 
tempted (Score E), we find the means for Score B (ten difficult 
items) to be extremely close to the expected chance means of 
1.84, 1.10, and 1.91. The significant differences for Score B 
between Form 2 and the other two forms are, therefore, related 
to the number of items marked rather than to knowledge of the 
words themselves. 

A curious result occurs in connection with Score C (ten nonsense 
items). Each of the obtained means, 2.48, 1.63, and 2.69, is 
significantly greater than the corresponding expected chance 
figure: 1.84, 1.10, and 1.90. Thus the examinees tend to agree 
with each other and with the arbitrary key to an extent which 
cannot be explained by chance alone. To test the possibility 
that the selection of particular responses by the examinees might 
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1 Should the reader be interested in the score distributions, they may be 
obtained from the authors at Educational Testing Service, Princeton, New 


Jersey. 
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TABLE 1.—MEANS AND STANDARD DEVIATIONS OF SEVEN SCORES 
oN Eacu Form 





Form 1 Form 2 Form 3 





Score 
Mean} SD |Mean;} SD |Mean} SD 





Number right: 
A (80 regular items)..... 42 .87/11.33)41.25)11.34/42.91/10.70 


B (10 difficult items) 1.83) 1.31) 1.03) 1.11) 1.96) 1.34 
C (10 nonsense items)....| 2.48} 1.36) 1.63} 1.56) 2.69) 1.44 
D(A+B+(0C)........ 47 .18)11.94/43 .92/12.06)/47 .56)11.37 


Number of responses: 
E (10 difficult items)....| 9.18) 2.16) 5.52) 4.10) 9.54) 1.59 
F (10 nonsense items)...| 9.16) 2.19) 5.48) 4.13) 9.50) 1.67 
i 2 2 y reper oe 18.34} 4.28/11.00) 8.14)19.04) 3.19 























be associated with the ability measured by the eighty regular 
items (Score A), the correlation between Scores A and C was 
computed for the Form 3 group. The correlation, .105, does not 
differ significantly from zero. (Item biserial correlations, to be 
presented later, constitute another expression of the same finding.) 
We may conclude, therefore, that the arbitrary key happened to 
include some unusually attractive keyed answers which were 
almost equally attractive to examinees of all levels of ability as 
measured by Score A. This being so, another arbitrary key 
might equally well have produced mean scores that are substan- 
tially under ‘chance’ values if it did not include any of the popular 
choices. (The most popular alternatives have been indicated 
below the set of items given earlier in this report.) Apart 
from the unexpectedly high values, the Score C means bear the 
same relationships to one another as do the Score B means. In 
each instance the mean for Form 2 is lower than the other two 
means by statistically significant amounts.’ 





2Group comparisons for Scores A, B, and C have been made by the 
method of analysis of variance. The F ratios are 1.91, 42.2, and 39.3, 
respectively. With 2 and 798 degrees of freedom in each instance it is 
clear that the between groups variance is not significant for Score A but is 
significant beyond any doubt for Scores B and C. 
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On Score D (score on all one hundred items) the means for 
the Form 1 and the Form 3 groups are very similar. Their 
difference of 0.38 divided by its standard error, 1.01, is only 0.38. 
The mean for the Form 2 group, however, is less than the other 
two means by amounts which are in each case more than three 
times the standard error of the difference and, hence, which are 
statistically significant beyond the one per cent level of confidence. 

Scores E (number of responses to ten difficult items) and F 
(number of responses to ten nonsense items) are so similar that 
they can be considered together. For both Form 1 and Form 3 
the mean number of items tried out of each set of ten items is 
more than 9.00. Nevertheless, the small difference of less than 
one-half of an item between these groups is statistically signifi- 
cant at the five per cent level of confidence. The Form 2 group, 
on the other hand, followed the instructions not to guess to the 
extent that the mean number of items answered is about 5.5 
for each set of items. This figure, however, by no means repre- 
sents the typical Form 2 examinee. Despite the instructions, 
nearly forty per cent of the students tried all the special items. 
An equal number tried no more than twenty-five per cent of the 
items, whereas the rest are scattered through the remaining 
score range. In contrast, about eighty per cent of the Form 1 
and Form 3 groups tried all the special items, and no secondary 
mode appears at the low end of the distributions. 

The means for Score G (E + F), of course, add no new informa- 
tion, but from the standard deviations of E, F, and G there can 
be calculated the correlations, rgy, which are .932 for Form 1, 
.954 for Form 2, and .924 for Form 3. Correlations of this magni- 
tude, based on so small a number of items, indicate an unusually 
high degree of consistency of response. Reaction to the difficult 
items was essentially the same as reaction to the nonsense items. 
It should be noted that the correlations are someavhat inflated 
by the large numbers of examinees who tried all the special 
items. If all such cases are omitted from the calculations, how- 
ever, values of .87, .83, and .85 are obtained, which may properly 
be regarded as lower bounds of the relationship that actually 
exists. This extremely consistent behavior appears from the 
present data to be in the nature of a ‘response set,’ determined in 
part by the test directions and in part by the personality or 
previously established habits of the examinee. 
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The relationship between ability in the area of the test and this 
tendency to respond to the special items can be measured by the 
correlations between Score A and Score G. These are .039 for 
Form 1; .160 for Form 2; and .007 for Form 3. The Form 2 
correlation, although too low to be of practical importance, is 


TABLE 2.—FREQUENCY DISTRIBUTIONS OF NUMBER OF OMISSIONS 
FOR Eacu ITEM 

















Form 1 Form 2 Form 3 
Number 
of Omis- es , j 
een Regular | Special | Regular | Special | Regular | Special 
Items | Items | Items | Items | Items | Items 
140-149... ] 
130-139... 4 
120-129.. | S 
110-119... 1 1 
100-109. . 3 5 
90- 99.. _ l 
80- 89.. | 
70- 79.. 5 
60— 69.. 4 
50- 59..)| 7 
40-— 49.. 5 
30- 39.. | 14 
20- 29.. 3 | 13 8 ] 
10—- 19.. 12 7 8) g 18 
O- 9..| 65 22 70 2 
Total. .| 80 20 80 20 80 20 
Mean....| 6.8 21.0 | 34.8 120.5} 5.9 13.5 























statistically significant at the one per cent level of confidence. 
In other words, when instructions were given not to guess, there 
was a slight but real tendency for the more able students to 
disregard the instructions and to attempt the special items. The 
Form 1 and Form 3 correlations can be regarded as chance 
deviations from zero. 
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The effect of the different instructions on examinee behavior is 
strikingly portrayed in another fashion in Table 2, which gives 
frequency distributions of the number of omissions per item. 
(An unanswered item is considered an omission only if a later 
item has been answered. Otherwise it is classified as ‘not 
reached.’) Similarities and differences are so clear-cut that tests 
of statistical significance are not necessary. Under all the con- 
ditions the special items were omitted more frequently than the 
regular, easier items, but the difference between special and 
regular items is particularly great for Form 2 (instructions not to 
guess). Only a few more omissions occur for Form 1 (no instruc- 
tions on guessing) than for Form 3 (instructions to answer every 
item). 

If the different instructions affect the typical examinee’s rate of 
work, proper timing under various conditions becomes an impor- 
tant problem. The use of a speeded test would be necessary to 
establish typical rates of work. In the present study the test was 
not speeded, and no answers to this problem can be offered. The 
data on the speededness are summarized below in tabular form: 


Form 1 Form 2 Form 3 
Per cent of examinees who reached 
I i gs wn ws terns a 97.4 91.8 97.8 
Per cent of examinees who reached 
98 or more items................ 98 .9 98 .9 98 .9 


It happened that in each group 264 of the 267 examinees reached 
Item 98. It is possible that virtually all the unanswered items 
at the end of the test are in the nature of omitted items, that is, 
items read but intentionally not answered, rather than items not 
reached for lack of time. 

The inclusion of the twenty special items had a deleterious effect 
on the test reliability. Reliability coefficients were computed by 
the Kuder-Richardson formula (20) for the 80-item tests and the 
100-item tests. They are presented in Table 3, which also 
includes the Spearman-Brown predicted coefficients for 100-item 
tests that would be obtained by adding twenty regular items to 
the original eighty items. 

One might expect the special items to be more or less ‘dead 
wood’ in the test, with no major systematic effect on the test 
reliability. This is essentially the case. It is interesting to note, 
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TABLE 3.—COEFFICIENTS OF RELIABILITY 
Difference 
100-Item Test Column (3) 
80-Item Test 100-Item Test Predictedfrom Minus 
Form (Score A) (Score D) 80-Item Test Column (4) 


(1) (2) (3) (4) (5) 
Biases . 892 .879 .912 — .033 
ee .896 .892 915 — .023 
a . 880 . 866 . 902 — .036 


however, not only that all three coefficients in Column (3) of the 
table are lower than corresponding values in Column (2) but also 
that the decrease, small though it may be, is in this instance 
directly related to the number of responses made to the special 
items. The differences in Column (5) show the loss in efficiency 
due to the presence of the special items. 

Let us now consider item statistics. The difficulty of each item 
was measured in terms of ‘delta.’ Delta is computed by first 
obtaining the normal deviate which corresponds to the proportion 
answering the item correctly and then transmuting it to a scale 
with mean of 13 and standard deviation of 4. The higher the 
delta, the more difficult the item. The biserial correlation 
between item score and criterion score was computed twice for 
each item: once with Score A as criterion and once with Score D 
as criterion. It was expected that the presence of the special 
items in the criterion score would decrease the correlations for the 
regular items. Means and standard deviations of these statistics 
are presented in Table 4. 

The difference of .24 between the mean delta for the regular 
items of Form 2 and the means for Forms 1 and 3 appears too 
small to be of great import. In the case of the twenty special 
items, the differences between the mean deltas are greater than 
the differences for the regular items. All three differences 
between forms are statistically significant beyond the one per 
cent level of confidence, even though only twenty pairs of 
observations are involved in each instance.* For very difficult 
items a change in directions is likely to result in a change in the 
index of item difficulty sufficient to affect the precision of item 





3’ The standard errors of these and other differences based on the data of 
Table 4 were computed from the formula, og? = o,? +- oy? — 2o,orTar. 
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TABLE 4.—MEANS AND STANDARD DEVIATIONS OF ITEM STATISTICS 









































Form 1 Form 2 Form 3 
Statistic 
Mean| SD |Mean| SD |Mean! SD 
Delta 
80 regular items........... 12.81/2.58)13.05/2.71|12.81/2.61 
20 special items........... ./16.26/1. 18/17. 63|1.17/16.02/1.10 
Tbis 
80 regular items: 
Score A criterion......... .440|.115) .453|}.119) .423 |. 127 
Score D criterion........ .| .483}.110} .446].113) .417].128 
20 special items: 
Score A criterion.........| .070).148) .130|.109| .086).146 
Score D criterion......... .127)}. 138) .220/}.109| . 140). 152 





equating. Without furzher experimentation, this generalization 
applies to additional items but does not extend beyond the 
particular groups employed, although there is no reason to believe 
that the present groups are atypical with respect to these results. 
Similar experiments could be expected to yield results of the same 
nature as those reported here. 

All mean biserial correlations for the regular items computed 
with Score D as criterion are smaller than the corresponding 
means with Score A as criterion. Although the differences are 
statistically significant beyond the one per cent level of confi- 
dence, they are very small, too small to be of any practical 
consequence. For the three forms the means of the differences 
are, respectively, .0067, .0075, and .0067. The standard devia- 
tions are .0164, .0243, and .0184; and the standard errors of the 
means are .0019, .0028, and .0021. For the twenty special items 
a significant increase appears. This result is without doubt due 
to the fact that the special-item scores are included in Score D 
but not in Score A, and it can be explained entirely on this basis. 

Differences in mean biserial correlations between forms tend not 
to be statistically significant. For the regular items the two 
Form 2-minus-Form 3 differences are significant at the five per 
cent level but not at the one per cent level. In the case of the 





4 For a discussion of item equating see L. L. Thurstone, ‘‘The calibration of 
test items,’ The American Psychologist, 1 (March, 1947), 103-4. 
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special items, none of the differences between values in the first 
row is significant, but when Score D is used as criterion the Form 
2 mean is greater than either of the other means by amounts 
which are significant at the five per cent level of confidence. 

The findings of this study may be summarized in part in terms 
of the hypotheses set down at the beginning of this report. 

1) There did occur some guessing on the part of all three 
groups. For no group was the mean number of responses to the 
set of special items less than fifty per cent. (The special items 
are the very difficult items and nonsense items, which invite 
guessing. ) 

2) The amount of guessing varied with differences in instruc- 
tions. The variation was slight but statistically significant 
between the group receiving no instructions about guessing and 
the group told to answer all the items. The group told not to 
guess responded to substantially fewer items than either of the 
other two groups. 

3) The relationship between ability and the tendency to 
guess is very low. The correlation between the score on the 
regular items and the number of responses to the special items is 
positive for each group, but only one coefficient, that for the 
group told not to guess, differs significantly from zero. 

Other findings and implications are as follows: 

4) The classifications, ‘very difficult’ and ‘nonsense,’ may be 
useful only to the test writer. To the examinee there may be no 
difference, for he appears to respond in the same way to both. 
Thus, a very difficult item, though it may have a perfectly sound 
answer, may act in the test in the same way as though it were 
a nonsense item. ‘ 

5) Too many very difficult items in a test can lower its 
reliability. 

6) Instructions which discourage guessing may reduce accurate 
comparison of groups and test forms when the comparison 
involves also a test with instructions to guess or no instructions 
about guessing, where such comparisons are made through 
measures of item difficulty . 

7) Measures of internal validity of individual items are not 
seriously affected by including some very difficult items in the 
criterion. In the present study twenty per cent of the criterion 
Score D were ‘special’ items. Normally, the percentage of 
difficult items would be lower than this figure. 


- 
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RELATIVE CONTRIBUTIONS OF APTITUDE AND 
WORK HABITS TO ACHIEVEMENT IN COLLEGE 
MATHEMATICS! 


WILLIAM C. KRATHWOHL 


Institute for Psychological Services 
Illinois Institute of Technology 


A question which frequently arises is, ‘How much is achieve- 
ment in a subject influenced by ability and how much is it 
influenced by such personality factors as work habits of indus- 
triousness and indolence?’’? A solution to this problem for 
English was given by Krathwohl (5,6). He found that a meas- 
ure of work habits of industriousness and indolence for English 
could be secured by defining an index of industriousness for 
English to be equal to the score, for an individual, on an English 
achievement test minus his score on a vocabulary test provided 
both scores were based on the same group (7). The scores used 
for both tests were derived scores which have a mean of 20 and a 
standard deviation of 4. These scores can readily be transformed 
to the more familiar standard scores with a mean of 50 and a 
standard deviation of 10 by multiplying the derived scores by 
214. By means of this index of industriousness for English he 
found that if students were grouped into above average, average, 
and below average groups according to ability in English as 
measured by vocabulary scores, work habits contributed as much 
or more toward achievement in English as did vocabulary. If, 
however, students were grouped according to indexes of indus- 
triousness for English into industrious, normal and indolent 
groups, practically all of the variance of achievement in English 
was accounted for by the vocabulary scores and practically none 
by the indexes of industriousness for English. Nevertheless, 
the interesting fact appeared that far superior prediction results 
for English achievement were obtained from vocabulary scores 





1 Presented at the Annual Meeting of the Midwestern Psychological 
Association at Cleveland, Ohio, April 25, 1952. 

For conciseness and also to avoid awkward construction, the word 
indolence, as employed in this investigation is used not in a derogatory sense, 
but rather as a substitute for under-achievement. In the same way, the 
word industriousness is used as a substitute for over-achievement. 
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than by means of any other method, provided the predictions 
were made separately for each of three groups—the industrious, 
normal, and indolent groups. 

The question now becomes, can the same conclusions be drawn 
for mathematics as were drawn for English, particularly since it 
was shown by Krathwohl (8) that work habits in mathematics are 
independent of work habits in English. In other words, he 
found that an individual could be very industrious in mathe- 
matics and at the same time could be very indolent in English. 
To answer this question it is necessary to measure work habits 
for mathematics. This was done (3,4) by defining an index of 
industriousness for mathematics to be the score that an individual 
makes on a mathematics achievement test minus his score on 
a mathematics aptitude test. The scores used for both tests were 
derived scores which have a mean of 20 and a standard deviation 
of 4. For college algebra, the scores used were the grades 
A, B, C, D, and E which were replaced by the numbers 3, 2, 1, 0, 
and —1, respectively. 

In order to ascertain if the results previously found for English 
also held for mathematics, a group of 859 freshmen at the Illinois 
Institute of Technology were selected who took orientation tests 
between September, 1947, and February, 1949. The mathe- 
matics aptitude test selected was the lowa Mathematics Aptitude 
Test, Form M. The mathematics achievement test which was 
used to compute the index of industriousness for mathematics 
was a locally prepared test on algebra and mensuration, called 
the Mathematics Preparation Test. Both of these tests were 
given before the student entered the Institute. The college 
mathematics achievement test from which the relative effects of 
aptitude and work habits were to be computed was ‘college 
algebra, which was given at the end of the first term to students 
who were properly prepared in algebra and one term later if a 
review of high-school algebra were necessary. The investigation 
of the relative contribution of mathematics aptitude and indexes 
of industriousness for mathematics was made by means of 
multiple correlation techniques and the results of the investigation 
are shown in Table I. 

In Table I, the columns are numbered from 1 to 8 to facilitate 
discussion. Wherever the subscript 1 appears, it refers to the 
college algebra grade, the subscript 2 to the mathematics aptitude 
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TaBLE I.—INTERCORRELATION AND MULTIPLE CORRELATION 
COEFFICIENTS AMONG (1) CoLLEGE ALGEBRA, (2) MaTue- 
MATICS APTITUDE, AND (3) INDEXES OF INDUSTRIOUSNESS 
FOR MATHEMATICS WITH THEIR RELATIVE 


CONTRIBUTIONS 
Column No. 1 2 3 4 5 6 7 8 
Math. 
N T 12 T13 To3 11.23 Apt. II. K? 
Group 
All 859 .46 .08 —.51 .59 31 3. 65 
Above 
average 286 .34 .41 —.04 .53 12 17 71 
Average 362 .10 04 —.27 .42 2 16 82 
Below average 211 .17 .21 —.44 .37 5 7 87 
Industrious 216 .53 -—.08 —.44 .56 33 —-1 69 
Normal 430 .59 .08 —.18 .62 37 2 62 
Indolent 213 + .49 15 —.29 .58 29 5 67 


derived score and the subscript 3 to the index of industriousness 
or I.I. for mathematics. 

The first column gives the frequency of the group being 
investigated. The second column gives the correlation coeffi- 
cients between college algebra grades and mathematics aptitude 
scores. The third column gives the correlation coefficients 
between college algebra grades and indexes of industriousness or 
I.I.’s for mathematics. The fourth column gives the correlation 
coefficients between mathematics aptitude scores and I.I.’s 
formathematics. The fifth column gives the multiple correlation 
coefficients for college algebra grades when account is taken both 
of the mathematics aptitude scores and the I.I.’s for mathe- 
matics. The sixth column gives the percentage of variance in 
college algebra grades which is contributed by the mathematics 
aptitude test. The seventh column gives the percentage of 
variance which is contributed by the I.I.’s, and the eighth column 
gives the percentage of variance still to be accounted for. 

It happens in this table that a sharp distinction can be made 
easily between the correlation coefficients which are significant at 
the one per cent level and those which are not. All coefficients 
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whose absolute values are greater than 0.17 are significant at 
better than the one per cent level, whereas, all coefficients equal 
to or less than 0.17, either are significant at the five per cent level 
or else are not significant. 

In the first row, where the entire group is considered, the 
second column shows that the correlation between college algebra 
grades and mathematics aptitude scores is equal to 0.46 and is 
significant at well above the one per cent level. The percentage 
of variance in the sixth column, which the mathematics aptitude 
score contributes to college algebra grades, is thirty-one per cent, 
the percentage of variance which the I.1.’s contribute is three 
per cent and the per cent of variance still to be accounted for 
is sixty-five per cent. The relatively large negative coefficient 
in column 4 of —0.51 between mathematics aptitude and I.I.’s for 
mathematics shows the tendency for increased industriousness on 
the part of students with low mathematical ability and may be 
due to screening effects during the first term which operated to 
eliminate the less industrious students. The correlation coeffi- 
cient in the third column between college algebra and I.I.’s for 
mathematics of .08 shows that if the group is taken as a whole 
there is little if any correlation between grades and work habits. 
This is also shown by the seventh column where the contribution 
of I.I.’s to grades is only three per cent. 

The combination of a smaller correlation between college 
algebra and mathematics aptitude than is usually found, the 
moderate contribution of mathematics aptitude to achievement 
and the rather high unknown variance is partly due to homoge- 
neity of the group, they being required to pass an entrance exam- 
ination involving mathematicsachievement. Other factors which 
tended to lower the correlation coefficient between mathematics 
aptitude and college algebra grades were that only five grades 
were awarded, partly on a subjective basis, and that grades given 
by college instructors are often not as reliable as they might be 
(Krathwohl (2)). 

Previous investigations with indexes of industriousness have 
sometimes shown that using the entire group conceals some 
variations. By resolving that group into smaller ones these 
variations become apparent. 

The first division of the entire group in Table I, into above 
average, average, and below average groups was made on the 
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basis of the derived scores of individuals on the mathematics 
aptitude test. Scores for the above average group were 23 and 
above, for the average group from 18 to 22 and for the below 
average group 17 and below. The theoretical frequency per- 
centages for three groups were twenty-seven per cent, forty-six 
per cent and twenty-seven per cent, respectively, or approxi- 
mately the highest quarter, the middle half and the lowest 
quarter respectively. The divergence of the actual frequency 
percentages—thirty-three per cent, forty-two per cent and 
twenty-five per cent—of these three groups from the theoretical 
frequency percentages may be due to the screening effect of 
the entrance examinations, because the above average group has 
more than its share, while the average group has less. 

The second division of the entire group into industrious, normal 
and indolent groups was made on the basis of the indexes of 
industriousness for mathematics. Indexes of industriousness for 
the industrious group were 3 and above; for the normal group 
from —2 to plus 2; and for the indolent group, —3 and below. 
It can easily be shown that if the aptitude test, the first achieve- 
ment test from which I.I.’s are computed, and the indexes of 
industriousness for mathematics form normal frequency distribu- 
tions, and if derived scores are used, that the standard devia- 
tion of the I.I’s is equal to 4 +/2(1 — r) where r is the correlation 
coefficient between the aptitude test and the achievement test 
from which the I.I.’s are computed. In this case r is equal to 
0.52. If it is assumed that the I.I.’s are continuous instead of 
discrete variables so that the lower limit for the industrious 
group is 2.5 instead of 3, the theoretical frequency percentages 
of the industrious, normal and indolent groups are twenty-six per 
cent, forty-eight per cent and twenty-six per cent, respectively, or 
roughly the highest quarter, the middle half and the lowest 
quarter, respectively. Since the actual frequency percentages 
of the industrious, normal and indolent groups are twenty-five 
per cent, fifty per cent and twenty-five per cent, respectively, it is 
seen that these frequencies come very close to the theoretical 
figures. 

When the entire group is divided into above average, average 
and below average groups on the basis of the mathematics 
aptitude scores, all the correlations in column 2 between college 
algebra and mathematics aptitude are lower than the ones for 
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the entire group. In fact, both of the correlations in column 2, 
rows 3 and 4 of 0.10 and 0.17 not only are low, but also are not 
reliable at the one per cent level of significance. Such a lowering 
of coefficients and reliabilities may be due to the increased 
homogeneity of these three groups with respect to mathematical 
ability. The interesting fact is that the three correlation coeffi- 
cients in column 3 between achievement and I.I.’s for mathe- 
matics for the division into ability groupings not only are much 
higher, but all are significant at well above the one per cent level. 
Evidently when the students are grouped according to ability 
more accurate predictions can be made from work habits than 
from ability. Furthermore, a comparison of columns 6 and 7 
for the ability groupings shows that the indexes of industrious- 
ness or, in other words, work habits contribute more toward 
achievement than mathematics aptitude. However, column 
8 indicates that with the ability groupings much more of the 
variance is unaccounted for than for the groupings according to 
work habits or for the group as a whole. 

The three correlations in column 2 for the ability groupings 
show that with such groupings, predictions of grade average 
on the basis of ability are useless or of little value. It is true that 
predictions for achievement can be made instead by means of 
work habits, but the multiple correlation coefficients for the 
ability groupings are so much higher than any of the zero order 
correlations that it is preferable to predict for success with ability 
groupings by using multiple correlation coefficients. In column 3 
the highest coefficients of 0.41 and 0.37 between achievement and 
work habits for the above average and average groups, respec- 
tively, shows that it is worth while in a counseling situation to 
advise students of above average and average scores im mathe- 
matics aptitude, of the possibility of securing greater achieve- 
ment by increased industriousness. In column 4 for the below 
average group, the coefficient of —0.44, which is of appreciable 
size, between mathematics aptitude and indexes of industrious- 
ness for mathematics, indicates a greater tendency for the below 
average group to compensate for their lack of ability by means of 
increased industriousness than for the above average and average 


groups. 
When the original group is divided into industrious, normal and 


indolent groups on the basis of indexes of industriousness for 
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mathematics, the three correlation coefficients between college 
algebra grades and mathematics aptitude scores all become, with- 
out exception, higher than any of the other coefficients in column 
2 and all are significant at well above the one per cent level. 
Column 3 for these three groups indicates a negligible correlation 
between achievement and work habits, probably because the 
effect of work habits has been taken care of in the method of 
division itself. Column 4 in which all three correlation coeffi- 
cients between mathematics aptitude and indexes of industrious- 
ness for mathematics are negative and significant shows the 
tendency for the less able students to be more industrious than 
the others. This fact is particularly true for the industrious 
group, where the negative coefficient of —0.44 indicates the 
strongest tendency as compared with the normal and indolent 
groups for the less able students to compensate for lack of ability 
by means of increased industriousness. Columns 6 and 7 show 
that with work habit groupings, mathematics aptitude con- 
tributes much more toward success in college algebra than work 
habits. In fact, the contributions of work habits are practically 
negligible. 

In column 8 the large unaccounted-for variance toward 
achievement in college algebra for all divisions and for the group 
as a whole indicates that many other factors are at work toward 
success besides mathematics aptitude and work habits. Some of 
these doubtless are reading ability, general intelligence, and 
methods of awarding grades. 

If zero order correlation coefficients are used, the most accurate 
prediction for achievement in college algebra is made by dividing 
the original group into industrious, normal and indolent groups, 
and predicting achievement separately for each group by means 
of their respective regression equations. 

The answer to the question previously proposed as to whether 
the same conclusions can be drawn for mathematics as were drawn 
for English, can now be answered in the affirmative. A compari- 
son of this study for mathematics with the previous one for 
English shows a remarkable similarity between the two in spite 
of the fact that work habits for mathematics are independent of 
those for English. The principal difference is that all of the 
correlation coefficients for English are larger than are those for 
mathematics. These differences are undoubtedly due to the use 
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of an objective test with twenty-one scores in measuring English 
achievement, whereas instructors grades with a range of five 
scores were used to measure achievement in mathematics. 

The similarities which were observed between mathematics and 
English are the following: 

1) When the group is considered as a whole, predictions for 
achievement are independent of work habits and the contributions 
of indexes of industriousness both for mathematics and for 
English are ‘negligible. Also, individuals with lower aptitudes 
tend to have higher indexes of industriousness. 

2) The multiple correlation coefficient for achievement based on 
aptitudes is higher by about 0.12 for both English and mathe- 
matics, than when zero order correlation coefficients are used 
with the respective aptitudes. 

3) When the original group is divided, using aptitude group- 
ings, into above average, average and below average groups the 
following similarities are noticed: (a) Correlation coefficients 
between achievement and aptitude become lower than if the 
single original group is used and in some cases become so low as to 
be not satisfactorily significant. (b) Correlation coefficients 
between achievement and work habits, with one exception, are 
higher for each group for both subjects than are those between 
achievement and aptitude, and all of these correlation coefficients 
are significant at the one per cent level. 

4) When the original group is divided into industrious, normal 
and indolent groups on the basis of work habits, the following 
similarities are observed: (a) All correlation coefficients between 
achievement and ability are higher than are those for ability 
groupings and in nearly all cases are higher than when the group 
is considered as a whole. (b) Correlations between achiévement 
and work habits are small and are not statistically significant. 
(c) Correlations between aptitudes and work habits on both 
subjects are, with one exception, negative, indicating that low 
ability students have to work harder and do work harder than 
their brighter companions. (d) There is a tendency for the 
industrious and normal students to have higher correlation 
coefficients between ability and achievement than for the indolent 
students. This characteristic is similar to that found by Harts- 
horne and May in their studies of the social habits of honesty, 
truthfulness and morality (1). They found that the possession 
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of desirable traits is accompanied by increased consistency and 
predictiveness. 

5) With both mathematics and English the most accurate 
predictions for achievement are made on the basis of aptitudes 
when the groups are divided into industrious, normal and indolent 
groups on the basis of work habits, provided the predictions are 
made separately for the industrious, normal and indolent groups. 

6) The general conclusion which can be drawn from this study 
is that in spite of the fact that work habits of industriousness for 
English and for mathematics are independent of each other, their 
effect on achievement in their respective subjects is very similar. 
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READING FOR PROBLEM-SOLVING IN SCIENCE! 
J. HARLAN SHORES and J. L. SAUPE 


University of Illinois 


The measurement of reading rate and comprehension has been 
the subject of numerous researches since the development of 
objective measuring devices. Few areas have received as large 
a share of the interest and energies of test-makers. Many of 
these researches were attempts to discover the basic nature of 
the reading process and subsequently to improve the measure- 
ment of this process. The reading tests so constructed and now 
in use are evidently based on the existence of a general ability 
to read (5). In general they include such sections as vocabulary, 
ability to follow directions, ability to grasp the central thought 
of a passage, and general comprehension (word, phrase, sentence 
and paragraph). In grades four, five and six, typical items 
measuring comprehension require the ability to grasp and retain 
facts contained in a short paragraph. 

Within recent years doubt has been cast concerning the exist- 
ence or at least the value of the concept of a generalized ability 
to read beyond the primary grades (2,3,13,14,15,17). Rather it 
has been hypothesized that reading skills differentiate with many 
variable factors, each of which, when varied from one test situa- 
tion to another, would affect a student’s test score. One analysis 
lists fifteen such factors which should be recognized and either 
held constant or measured (/4). Three factors mentioned as 
having received very little consideration in reading test con- 
struction in the past are content area of material being read, 
reader’s purpose (what he intends to get from the reading) and 
reader’s experience background for the specific content of the 
reading passage (/4). 

Recent investigations leave little doubt that reading rate and 
comprehension are affected by the kind of material being read. 
It seems reasonable to expect that the reading skills required for 
science material will differ from those required for materials of 
history, mathematics, or other content areas, each of which 
requires its peculiar combination of abilities. Certainly good 





1 This study was made possible by a research grant from the College of 
Education, Bureau of Research and Service, University of Illinois. The 
research design and conclusions, however, are those of the authors. 
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readers in one content area can be expected to read well in 
other areas and poor readers in one area can be expected to 
read poorly in other areas, but there are differences within these 
groups which cannot be explained by the concept of a general 
ability. Tinker (19) and McCallister (9) support the contention 
that reading ability differentiates with the content field in which 
the reading is done. 

The results of these and other investigations which point to the 
denial of a general ability to read regardless of the kind of material 
being read, imply that for reading test scores to be of real value 
they should be reported in terms of ability to read in different 
content areas. There would need be separate tests of ability to 
read historical materials, scientific materials, and the like. Only 
in this manner would it be possible to determine an individual’s 
variable proficiencies in the various content areas. 

Experimental evidence and expert opinion each support the 
theory that the specific purpose in the reader’s mind when he 
approaches a work-type reading task is a major determinant of 
both reading rate and comprehension. More than twenty years 
ago Gray pointed to the effect of this factor on reading rate and 
comprehension (7). Since that time the results of research and 
considered judgment of scholars in the field of reading have sup- 
ported him by reporting a relationship between purpose and 
reading rate and comprehension (/,4,/17,/9). Reading compre- 
hension includes the ability to adjust the rate of reading and the 
specific skills employed to the purpose for which the material is 
being read. For test-makers this fact implies that the factor 
of reader’s purpose should receive attention in the construction of 
reading tests. At present this factor is neglected and neither 
the test-maker nor the interpreter of test scores can know how 
proficient the readers might have been if they had been reading 
for a well defined purpose. Test taking is a special instance of a 
learning situation. Since learning involves goal seeking and the 
reader’s purpose sets his immediate goals, the test-maker cannot 
know what he has measured until he can make fairly valid 
assumptions with respect to the similarity of purpose among 
the testees. Shores (/4) suggested that prior to the printed 
passage of the test the reader should be given a clear purpose for 
his reading. Similarly, Dolch (6) recommended that since 
modern textbooks are written with a purpose in mind, reading 
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tests should also have a purpose. Ideally reading tests should 
be as similar as possible in purpose to the purposes for which 
reading is taught. 

With respect to experience background of the reader the test- 
maker must assume that the readers have had equivalent experi- 
ence with the specialized subject matter of the reading passage 
in order for the test to be valid in comparing individuals or 
groups. A,child with a wealth of experience in aviation will 
comprehend a passage about airplanes better and more rapidly 
than one who has not had this experience. This requirement of 
equivalent experience is not easily met and has been violated 
frequently in reading test construction. Methods for meeting 
this requirement would be to select the subject of the reading 
passage in such a manner that it might be assumed that the 
testees have had little if any specific background for it or to use a 
great number of passages with the expectation that the effects 
of experience background would cancel out. At the same time 
the test passage should be typical of the kind of content and 
purpose for which measurement is desired. 

The failure of current reading tests\to take these factors into 
consideration would prompt the construction of a reading test or 
battery of tests which account for them more adequately. This 
work would culminate with a standardized battery of tests con- 
taining at least one test employing the content of science, one 
using the social studies, one arithmetic, and so on. Each test 
should have a clearly defined purpose stated for the reader at the 
beginning of each reading passage. Every possible attempt 
should be made in selecting the subject of the written material 
to hold the factor of experiential background constant. 

For at least the past three decades, theory of method in ele- 
mentary schools has assumed problem-solving as a primary 
approach. Thus the growth in the training and use of reading 
to solve individual and group problems that is evidenced in 
schools today may be expected to continue. It follows that if 
the new reading tests are to be of maximum use in predicting 
success or in measuring the relative position of an individual or a 
class with respect to the kinds of skills needed for normal class- 
room activity, the individual tests should be measuring reading as 
a tool in problem-solving. In other words, what the individual 
tests would measure is the ability to do that kind of reading 
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ordinarily done in elementary-school classrooms with the 
materials of science, the social studies, and the like. 

As a beginning in the construction of such a test battery there 
is now in the process of refinement a test at the fourth-, fifth-, 
and sixth-grade levels employing the content of elementary-school 
science. It has been called a Test of Reading for Problem- 
solving in Science. When scores on this test are reported and 
analyzed the question naturally arises: What are the relationships 
between that which is measured by this test of reading for prob- 
lem-solving in science and that measured by other reading tests 
and tests of general ability commonly used in the public schools? 


TESTS EMPLOYED 


The Test of Reading for Problem-solving in Science consists of two 
written passages of approximately eight hundred words each. 
In selecting the content for each passage, material was chosen to 
be typical of elementary-school science classes and yet such that 
children probably would not have come into contact with this 
particular content. This was an attempt on the part of the test- 
makers to hold the factor of experience background somewhat 
constant. 

The student is told in the test directions and again immediately 
prior to the reading of each passage, the purpose for which he is 
doing the reading, i.e., the problem he is trying to solve. The 
problem of the first selection is, ‘‘ What is the Best Way for the 
Farmer to Keep Grub Worms from Harming His Crops?” The 
student is told that he is reading the second passage to find out, 
“Do Plants or Animals Like Those on the Earth Live on Mars?”’ 
Following each passage are twenty four-choice multiple-choice 
type items based on the content of the passage. In general each 
of the first nineteen items following each passage requires the 
testee to make inferences from the facts in the passage. Each 
inference is considered to have some relationship to the desired 
solution of the problem. The stem of the final item of each 
part is a statement of the problem the student has been asked to 
solve. There were also four choices for these final items, and the 
responses to them were included in the total test score without 
weighting. The correct alternative to each of these items is the 
solution of the respective problem which follows most logically 
from the passage. 
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Although reading rate is important within broad limits in the 
classroom situation due to the limited length of the school day, 
an attempt was made to remove the rate factor by allowing suffi- 
cient time for each student to complete the entire test. For the 
purposes of this study the test scores of the very few students 
(fewer than three per cent) who did not finish the test were con- 
sidered not representative of their ability and hence were not used. 

All test items showed positive discrimination with the upper 
and lower fourths of the sample as judged by the total score. 
The reliability coefficient as computed by the Kuder-Richardson 
formula was +.82. The nature of the test makes an estimate 
of statistical validity impossible because it was designed to 
measure an ability not heretofore measured. Hence logical 
validity is the necessary approach. For a statement concerning 
the logical validity of the test see Husbands and Shores (8) who 
report a study in which this test was used. 

Tests of achievement and mental ability were administered to 
provide data for the relationships to be described later. These 
are as follows: 

1) New California Short-form Test of Mental Maturity, 
Primary or Elementary Battery (the appropriate form was 
administered at each grade level) ’47 S-form. 

2) Progressive Achievement Tests, Primary or Elementary 
(the appropriate form was administered at each grade level) 
Battery, Form A. 

In addition, sociometric measures of acceptance and rejection 
were taken. However, the correlations of these measures with 
all of the other test scores were so low and generally inconclusive 
that they are not included in this report. : 

A complete list of the scores used for each of the 182 cases is: 

1) Test score, Reading for Problem-solving in Science (referred 
to in the tables as Science Reading) 

2) Mental Age, language 

3) Mental Age, non-language 

4) Reading Age 

5) Arithmetic Age 

6) Chronological Age 


METHOD OF INVESTIGATION 


This study was conducted in a city of approximately eight 
thousand population in central Illinois. Classes were chosen in 
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schools located in the middle socio-economic categories. All the 
pupils in each classroom were tested. 

The test of Reading for Problem-solving in Science was admin- 
istered to two hundred fourteen fourth-, fifth-, and sixth-grade 
children. Of the two hundred fourteen cases for which scores 
were taken on the reading for problem-solving test there were 
one hundred eighty-two cases for which there was complete 
data on the California Tests of Mental Maturity and the Progres- 


sive Achievement Tests.’ 


TABLE ].—INTERRELATIONSHIPS BETWEEN READING FOR 
PROBLEM-SOLVING IN SCIENCE AND OTHER- 
MEASURED ABILITIES*’** 





| 


A. 


L 


| Progressive Arithmetic Age 
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| Progressive Reading Age 
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Mean | 8.D. 
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Science Reading |.82).61/. 49]. 63) .59 
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California Language M. A. |.61/.95).53).81).73) .29/124.5020.39 


California Non-Language 
M. A. .49).53).91).60).64) .33 128. 29)23 63 


Progressive Reading Age |.63).81/.60).90!.83) .35/129.11/16.71 
Progressive Arithmetic 

Age .99 
C. A. 08}. 











.73) 64) 83).93) 44/128. 5312.24 
29) .33 35].44/1 -00,123.06)14. 61 





























* Product moment correlation coefficients uncorrected. 
** Self-correlations are reliability estimates. 





2 This discrepancy is due largely to absences when the various tests were 
administered and children moving into and away from the school district 
during the period when data were collected. 
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It is important to note that the data from all the tests except 
the Test of Reading for Problem-solving in Science were collected 
from the same students one full year previous to this investigation 
in connection with another study.* Consequently, any marked 
irregularities in rate of growth of these attributes during the 
intervening year would disturb the results reported. 


TABLE 2.—CoORRECTED INTERRELATIONSHIPS BETWEEN READING 
FOR PROBLEM-SOLVING IN SCIENCE AND OTHER MEASURED 
ABILITIES* 


Arithmetic Age 


ot M. A. Non-Language 
C. A. 


Science Reading 
o> M. A. Language 
~1 Reading Age 


.09 
. 30 


oo 
> 
a 


Science Reading 
M. A. Language .69 
M. A. Non-Language 57 .58 
Reading Age .73 .88 .66 91 = .37 
Arithmetic Age .68 .78 .70 ~~ «91 .46 
C. A. 09 .80 .384 .387 ~~ = .46 


* Product moment correlation coefficients corrected for attenuation. 


or 

QO 
a oO 
Sc Qo 
~I 7 
S © 
Ww 
rs 


The data reported in Table 1 are product moment correlation 
coefficients computed from the raw scores. These data do not 
account for differences possibly due to the relative reliability of 
the measuring instruments. The data in Table 2 are corrected 
for attenuation and provide a better estimate of the true relation- 
ships among these factors.*® 





*G. Orville Johnson. ‘‘A study of the social position of mentally handi- 
capped children in the regular grades.’’ American Journal of Mental 
Deficiency, 55, No. 1, July, 1950. 


: : r 
‘ The formula employed to correct for attenuation is r.. = — 


———-~- ’ 
V raizs V rvve 
where re. is the corrected correlation coefficient, rz, is the uncorrected 
product moment correlation coefficient, and rz,,, and ry,,, are the respective 
reliability estimates of the two instruments whose scores are being correlated. 

®Quinn McNemar. Psychological Statistics. New York: John Wiley and 
Sons, 1949, p. 134. 
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RESULTS 


Limitations of sampling and the fact that the Test of Reading 
for Problem-solving in Science is still being revised suggest that 
this type of study should be repeated at a later date when more 
adequate instruments are available. At this stage factor 
analysis would seem to be appropriate to discover what the 
relationships are among the Reading for Problem-solving in 
Science test, current reading tests, and other tests of various 
mental abilities. The generalizations from the present study 
should be regarded as tentative. While many of the correlation 
coefficients reported in Table 2 are not statistically different from 
one another and others are not significantly different from zero,® 
there may be value in pointing to some relationships which seem 
to exist between reading for problem-solving in science and the 
other measured factors. 

1) Intercorrelations among the first five measures listed in 
Tables 1 and 2 (Reading for Problem-solving in Science, Mental 
Age—Language, Mental Age—Non-Language, Reading Age, and 
Arithmetic Age) are significantly positive in each instance. 
This indicates some general ability measured in common by these 
tests and which is also present in the Test of Reading for Problem- 
solving in Science. 

2) Reading to solve problems in science correlates highest with 
reading age and higher with language mental age than with non- 
language mental age. A major factor in this type of reading is 
probably the reading-language factor. 

3) Reading for problem-solving in science correlates lower with 
each of the other of the first five factors listed in Table 2 than does 
language mental age and reading age. The important implica- 
tion here is that this test is measuring an ability which has less of 
the general factor causing high intercorrelations among all of 
them. The suggestion is that ability to do the type of work-type 
reading required by problems in science, a reading skill which 
involves both reading and thinking critically about that which is 
read, is more independent of mental age than is general reading 
ability and is different in some degree from whatever is measured 
in tests of general verbal intelligence and general ability to read. 





6 Correlations within the range of +.18 to —.18 are not regarded as 
significantly different from zero at the one per cent level of probability. 
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4) Ability to read to solve problems in science correlates 
significantly lower with chronological age than do any of the 
other measures of mental ability or achievement. This suggests 
that this ability is nurtured less by maturation and incidental 
cultural impact than are the other measured abilities. It also 
suggests that significant development of this ability probably 
requires deliberately planned learning situations not uniformly 
provided in the schools employed in this experiment. 

5) Ability’ to read in order to solve problems in science cor- 
relates lower with Reading Age and Language Mental Age than 
do these two abilities with one another. Again the indication is 
that this ability is somewhat dissimilar to that measured as 
verbal intelligence or general ability to read. 


SUMMARY 


Considerable evidence is accumulating to support the hypoth- 
esis that reading ability differentiates beyond the primary 
grades into somewhat specific abilities to read different kinds of 
material for different purposes. Research along these lines 
continues to be hampered by the lack of adequate instruments for 
measuring whatever form these differentiated abilities assume. 
This investigation, using an instrument which after considerable 
development is still being revised, tends to support the hypoth- 
esis that reading of the kind employed in grades four, five and 
six to solve problems in science has a large factor in common with 
mental ability and general achievement as these are commonly 
measured and yet is somewhat unique in a manner which cannot 
be accounted for by these generalized factors. A reasonable 
prediction is that sharper measuring instruments will not only 
substantiate the hypothesis that general ability to read does 
differentiate into specific abilities, but will also describe the 
extent and nature of this differentiation and the amount and 
character of the remaining common general factors. 
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INTERRELATIONSHIPS AMONG MATERIALS 
READ, WRITTEN AND SPOKEN BY PUPILS 
OF THE FIFTH AND SIXTH GRADES 


HENRY R. FEA 
Sacramento State College, Sacramento, California 


The close relationships among the four language arts—speak- 
ing, listening, reading and writing—have been explored by 
research workers and advocated by curriculum-makers. A 
more intensive study of interrelationships among speaking, 
reading, and writing was attempted in the present study by 
applying nine measures to materials read, written and spoken on 
the same topic by a group of fifth- and sixth-grade pupils. 
The study employed measures which had previously been used 
for reading, oral language or written language and applied them 
simultaneously to samples of all three forms of communication. 
This made possible comparisons of the different language arts 
abilities at these grade levels and an analysis of the measures 
themselves. 

More specifically, the study attempted to answer the following 
questions: 

1) What is the level of development in each of the three lan- 
guage arts for the same children in grades five and six? 

2) Does varying the oral-written order of reproduction affect 
the quality of oral or written samples of pupils’ work? 

3) Does the developmental level of oral and written samples 
vary more with level of material read than with reading ability of 
the pupil? 

4) Is level of development revealed by one measure comparable 
with that revealed by others? 

5) Are any measures suitable as multiple-measures of the 
different language arts? 

6) If measures prove suitable as multiple-measures, what is the 
order of development in each factor considered? 


PREVIOUS STUDIES 


There have been few studies which attempted to measure 
development in more than one of the language arts simul- 
taneously. Lorge and Kruglov(/8) investigated the relationship 
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between intelligence and readability level of written compositions, 
for eighth- and ninth-grade pupils. They noted that readability 
level of compositions was approximately two grades below 
expected reading status. 

Bushnell(2) attempted an analytical contrast of some factors of 
the oral and written English of tenth-grade pupils. Each pupil 
gave a short narrative oral composition. Later, the pupils wrote 
themes on the same subject matter. Bushnell found only nine 
cases in which oral compositions were superior as measured by the 
opinion of judges using the Van Wagenen Composition Scale. 
Errors in sentence structure were more numerous In oral composi- 
tions. There were on the average three overloaded or disjointed 
sentences in every oral theme, but only one such sentence in every 
eight written themes. Bushnell gives his opinion that oral 
' language is less subject to training in the schools and remains on 
‘an immature level as judged by number of words, number of 
sentences and number of words per sentence as well as in general 
quality. He admits that sentence length may not be a valid 
measure because of its dependence on punctuation. 

Schonell(25) found more cases of backwardness in written 
language than in spelling or reading. He suggested environment 
as a reason, stating that reading and spelling are more dependent 
on direct teaching. He believes that reading affects vocabulary 
in oral and written language by unconscious assimilation. This 
does not operate with the same potency on the subtler character- 
istics of sentence structure. Thus, style and structure do not 
transfer to the same extent as vocabulary. 

Dow and Papp(é) investigated relationships among test scores 
of reading ability, language ability and grades in fundamentals of 
speech, public speaking and literary interpretation. Their 
subjects were students in sophomore English courses. Reading 
scores and scholastic aptitude scores were determined from tests 
given in freshmen year. Scores on fundamentals of speech, 
public speaking and literary interpretation were taken from grade 
books of instructors. They admit that, in light of measures used, 
validity of their findings may be open to question, but conclude 
that no significant relationships appeared among reading ability, 
language ability and speech ability. 

Lemon and Buswell(/6) investigated errors in oral and written 
expression of twenty ninth-grade children. Oral samples were 
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recordings of informal conversation by a concealed microphone. 
Written samples were themes obtained as class assignments. 
For comparison they equalized the two samples from each pupil to 
the sample containing the smaller number of words. Since the 
coefficient of correlation between oral and written errors was 
—.289 they questioned methods of teachi:.s which they felt 
yielded no transfer in a situation conducive to transfer. 

Watts(29) studied oral and written language development in 
children. He contends that success in written language is 
dependent upon success already having been achieved in oral 
language. He believes that children will not use the prearranged 
form characteristic of good prose but use the style of everyday 
speech which has no prearranged form. Thus he infers that 
speech is a groping process of clarification of thought, and that 
this is not true of written language. This point may be open to 
question. Careful writers ‘get something down,’ then rewrite for 
good form. 

Mathews, Larsen and Gibbon(1/9) investigated the importance 
of reading ability for freshmen taking rhetoric classes. They 
performed three experiments involving teaching composition 
purely by reading materials. Scholastic aptitude, reading and 
rhetoric levels showed all of the high grade group to be in the 
upper quarter in reading skill. The low grade group was only 
slightly above average in reading skill. All experimental groups 
showed appreciable improvement in reading and no appreciable 
lag in grammar as compared to the control group who were given 
direct instruction in rhetoric. 

Rossignol(21) explored relationships among hearing acuity, 
speech production and reading performance of primary-grade 
children. Hearing acuity was tested by a pure-tone audiometer, 
speech production by two examiners using an articulation test 
and a sound-repetitions test. Reading performance was checked 
by the Gates Primary Reading Test. She found a small but 
significant relationship between reading performance and speech 
production. 


THE MEASURES 


The experimental material consisted of three samples of lan- 
guage from each pupil: 
1) A transcription of the oral reading by each pupil of the story 
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Golden Harvest by Elizabeth Yates(23). This story had been 
adapted previously by use of the Lorge Formula(1/7) to an exact 
4.5 reading level. 

2) A transcription of the oral reproduction of the same story by 
each pupil. 

3) A transcription of the written reproduction of the same 
story by each pupil. 

The samples were then analyzed by the following measures: 

1) Vocabulary—number of words. 

2) Vocabulary—number of different words. 

3) Vocabulary—number of words not found on the Dale list(3). 

4) The Type-token Ratio—This measure was used by John- 
son(1/4). It is an expression of the ratio of the number of differ- 
ent words (types) to the total number of words (tokens). 

5) The Lorge formula for readability(/7). This is a measure 
of the reading difficulty of materials written for children, using the 
number of words, number of difficult words, number of sentences 
and number of prepositional phrases. 

6) The mean and standard deviation of sentence length. This 
measure was used by Schonell(24). 

7) Degree of subordination. This measure was used by La 
Brant(/5). It is expressed as a ratio of the number of dependent 
clauses to the number of independent clauses. 

8) Number of prepositional phrases. This measure has been 
used widely ; one example is that of Watts(29). 

9) Some measure of ideas expressed. 


THE SUBJECTS 


The one hundred forty cases were selected from children of the 
fifth and sixth grades of four elementary schools in two California 
cities. Basis for selection of subjects was: 

1) Reading ability of grade three or better as revealed by 
results of the Van Wagenen Unit Scales of Attainment. 

2) All children were of the white race and came from homes 
where English was the only language spoken. 

3) Normality of sight, speech and hearing as revealed by 
school records. 

Since it was possible to vary the order of reproduction of oral 
and written samples, and because variation of such order might 
affect the quality of the samples, two groups were used. Group 
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A followed the order; reading, telling, writing: Group B followed 
the order; reading, writing, telling. The groups were equated 
on a basis of sex, grade, age, reading grade scores, and socio- 
economic background. 


CONDUCT OF THE EXPERIMENT 


Each subject was taken to a small room in his school containing 
an Audograph recorder. The pupil was informed of the three 
tasks expected of him, shown the operation of the machine, and 
told that his reading and telling of the story would be recorded. 
For recording, the child remained seated, and a lapel microphone 
was used. Each child was given two minutes to organize his 
thoughts before telling the story he had just read. To write 
the story, the pupil was taken to a second small room containing 
several desks. 

When all samples had been procured they were subject to the 
following transcription and analysis: 

1) Oral reading recordings—analyzed according to the Gray 
Oral Reading Analysis(8) for fluency (time required to read the 
story), mispronunciations, omissions, substitutions, insertions, 
repetitions, reversals, and faulty phrasing (excessive pausing 
where no pause is indicated in the text, as used by Hahn(9)). 

2) Oral reproduction recordings—analyzed for repetitions, 
unintelligible remarks, punctuation, number of words, number of 
different words, number of words not appearing on the Dale 
list(3), number of prepositional phrases, number of sentences, 
number of run-on sentences, number of incomplete sentences (as 
defined by Hoppes(/2)), degree of subordination, and number 
of correct verbal memories (this is a measure of the number of 
reproduced facts). : 

3) Written reproductions—analyzed for the same factors as 
oral reproductions with exception of repetitions and unin- 
telligible remarks. 


ANALYSIS OF THE RESULTS 


An analysis of the original story as read by the children was 
made for comparison with their later oral and written samples. 
Results of this analysis are given in Table I. 

The second step was statistical analysis of the children’s 
language samples. Are the measures appropriate as measures of 
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material read, spoken and written? Does application of these 
measures show relationship among the language samples? 
Two statistical measures are used in an attempt to answer the 
second question: correlation, to reveal the degree of relationship; 
level of significance of difference between means when a measure 
is applied to two of the media. Obtained correlation coefficients 
are listed in Table II. Means, standard deviations and critical 
ratios between means are illustrated in Table III. 


TaBLE I.—ReEsvutts oF MEASuRES APPLIED TO THE SrTory, 
Golden Harvest 


Measures Results 
Total number of words 750. 
Total number of different words 327. 
Total number of hard words 70. 
Total number of phrases 65. 
Total number of sentences 63. 
Degree of subordination .39 
Total number of facts 109. 
Average sentence length 11.90 
SD of sentence length 5.97 
Type-token ratio 44 
Lorge grade rating 4.50 


From these results some twenty conclusions may be advanced 
about the interrelationships of the three language arts and their 
measures: 

1) Relationship among the number of minutes to read, tell and 
write the story. This measure is misleading because a pupil who 
speaks or writes quickly with occasional long pauses receives the 
same score as one who speaks or writes slowly with no long pauses. 
Table II shows negligible correlation except for oral and written 
reproduction correlation coefficient of .5. Investigations have 
shown positive correlation of reading comprehension and speed. 
Therefore, pupils who read quickly should have greater degree of 
comprehension and remember more. Because they remember 
more they should have more to tell and write. Thus, reading 
time should correlate negatively with oral and written reproduc- 
tion time. This reasoning disregards the possibility of a general 
verbal fluence factor which would tend to produce positive 
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TaBLE II.—CorRRELATION COEFFICIENTS OF RELATIONSHIPS OF 


LANGUAGE Arts FAcToRS 


Factors 

Total number of verbal memories: oral with written 

Total number of different words: oral with written 

Total number of hard words: oral with written 

Total number of phrases: oral with written 

Total number of words: oral with written 

Total number of minutes for reproduction: oral with 
written 

Total number of sentences: oral with written 

Type-token ratio: oral with written 

Reading grade with number of written verbal memories 

Reading grade with number of oral verbal memories 

Number of repetitions in reading with number of repeti- 
tions in oral reproduction 

Excessive phrasing in reading with excessive phrasing in 
oral reproduction 

Average sentence length: oral with written 

Lorge rating: oral with written 

Reading grade with Lorge rating of oral reproductions 

Number of minutes for reading with number of minutes 
for oral reproduction 

Number of minutes for reading with number of minutes 
for written reproduction 

Number of mispronunciations in reading with Lorge rat- 
ing of written reproductions 

Reading grade with Lorge rating of written reproductions 

Number of mispronunciations in reading with number of 
oral verbal memories 

Number of mispronunciations in reading with number of 
written oral memories 

Number of mispronunciations in reading with Lorge rat- 
ing of oral reproductions 
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time relationships. Perhaps both factors operate to cancel the 
effect of either. Mean reading speed is approximately one 
hundred seventeen words-per-minute; oral reproduction speed, 
one hundred eight words-per-minute; written reproduction, eight 
words-per-minute. The last figure is low in relation to speed 


of handwriting usually quoted for these grade levels. 
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2) Relationship of reading grade to number of oral and written 
verbal memories.—The positive relationship shown in Table IT 
would appear normal. Most reading tests make considerable 
demand upon immediate memory. It would seem natural that 
a pupil receiving a high oral score would receive a high written 


TaBLE III.—Means, STANDARD DEVIATIONS AND CRITICAL 
RATIOS BETWEEN THE MEANS oF Factors IN ORAL 
REPRODUCTIONS AND WRITTEN REPRODUCTIONS 
oF ONE HUNDRED Forty PvupILs 











Oral Samples be a ‘ Criti- 
cal 
ten! 2D ittnliaot-” 
Total number of words 291. 21/124. 15)196.48/81.36) 7.55 
Number of hard words 18.26) 7.96) 13.38) 6.31) 5.67 
Number of different words [{115.39| 39.16} 94.16)31.02) 5.03 
Number of sentences 15.21) 6.91) 11.35) 6.02) 4.99 
Number of phrases 19.77| 10.09) 15.47) 7.82) 3.98 
Number of run-on sentences} 1.96) 1.82} 1.33) 1.38) 3.26 
Number of verbal memories | 35.01) 15.45) 30.38)13.60) 2.66 
Degree of subordination .32 .09 .32| .10 01 
Average sentence length in 
words 19.99) 5.86} 22.85/23.01;—1.43 
Number of incomplete 
sentences .29 . 62 .43) .82)—1.57 
Lorge rating 4.50 .37| 4.81) 1.41/-—2.53 
Type-token ratio .43 .08 .50} .08)—8.17 




















score. The critical ratio of 2.66 is of such magnitude that 
separate norms would be necessary if the measure were used for 
both oral and written language. 

3) Relationship of the number of reading mispronunciations to 
the number of oral and written memories.—Table II shows negli- 
gible relationship. Probably two opposing factors produce the 
result. If a pupil cannot pronounce a word and does not know 
its meaning he will not use it orally or in writing; if he knows the 
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meaning but has difficulty with the pronunciation, extra time 
spent trying to pronounce it should fix it in his memory. 

4) Relationship of the number of reading mispronunciations to 
Lorge rating of oral and written samples.—The Lorge formula 
seems excellent as used with oral samples. Its use in written 
language samples of children who show no evidence of sentencing 
is questionable. 

5) Relationship of reading grade to Lorge rating of oral and 
written reproductions.—Results, shown in Table II, indicate 
little relationship between pupil reading-level and level of 
difficulty of style of his oral and written samples. However, they 
may indicate that level of material read has more influence on 
level of maturity of reproductions than reading level of the pupil. 

6) Relationship of excessive phrasing in reading to excessive 
phrasing in oral reproduction.—Results support the findings of 
Hahn(9) that it is a habit. 

7) Relationship of the number of repetitions in reading to the 
number of repetitions in oral language.—Table II shows a definite 
tendency for those who repeat in reading to do the same in oral 
language reproduction. 

8) Relationship of the total number of words used in oral and 
written language situations.—Results of Table II indicate a 
definite tendency for those who use more words orally to do the 
same in writing. In only seven cases did a pupil write more 
words than he spoke. The critical ratio is of such magnitude 
as to require different norms for the two media but the measure 
appears to be valid. 

9) Relationship of the number of different words used in oral 
and in written language samples.—The relationship here is 
greater than for total number of words. But, again, separate 
norms would be necessary. Pupils in this situation used in 
speaking approximately thirty-five per cent, and in writing 
approximately twenty-nine per cent of the number of different 
words encountered in reading. 

10) Relationship of the number of hard words in oral and 
written language.—Results are similar to those of the two 
previous measures so this measure is probably superfluous in 
measuring similarities when the other two are used. 

11) Relationship of the number of phrases in oral and written 
language.—Studies in oral language have been limited to con- 
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sideration of the age at which phrases appear in the language of 
the child. Stormzand and O’Shea(27) used the measure in 
written language. On the basis of their findings the present study 
should have shown the mean number of phrases in oral and 
written language as approximately thirty-five and twenty-four. 
Actual results as shown in Table III are 19.77 and 15.47 phrases 
per sample. Perhaps difference in material would account for 
this discrepancy as Stormzand and O’Shea used original material. 
Table II shows a definite relationship for this measure; therefore, 
it is valid but separate norms would be necessary. 

12) Relationship of the number of sentences in oral and written 
language.—Placing responsibility with the pupil for indication of 
sentencing has been used here. Previous investigators have 
considered the thought-unit as synonymous with a sentence. 
This makes the sentence count subjective and it has been con- 
demned by Watts(29), Johnson(/4), Seegers(26) and La Brant(/4) 
for this reason. In this study the sentence-unit appears suffi- 
ciently valid in oral language to justify its use. But due to 
inability of a few pupils to use punctuation, results obtained from 
written samples are questionable. Results are at variance with 
those of Bushnell(2) and Bear(/) in that the present study shows 
more sentencing. 

13) Relationship of the number of run-on sentences in oral 
and written language.—Results shown in Table III substantiate 
the findings of Wiswall(30). The measure is vulnerable to the 
extent that the definition of a sentence is subjective. 

14) Relationship of the number of incomplete sentences in oral 
and written language.—This measure tends to be more objective 
than a sentence count. Table III shows this to be the first 
measure considered where the mean for written samples exceeds 
that for oral samples. Also, this measure could be applied in 
both media using the same norms. However, it is an unsuitable 
measure at these grade levels as only thirty-one oral and thirty- 
nine written samples contained incomplete sentences. 

15) Relationship of the degree of subordination in oral and 
written language.—This measure has been used by many previous 
investigators such as Heider and Heider(/0) and La Brant(/4). 
From evidence of Tables II and III this would appear to be a 
suitable measure of oral and written language samples. The 
same norms would be suitable for both media. 
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16) Relationship of the number of correct verbal memories in 
oral and written language.—There is high degree of relationship 
here. 

17) Relationship of the average sentence length in oral and 
written language.—There is, according to Table II, low degree of 
relationship. The questionable validity of the sentence as a 
measure has been previously discussed. 

18) Relationship of the standard deviation of sentence length 
in oral and written language.—Result for oral language is similar 
to the original story: result for written language is very largely 
due to lack of pupil punctuation. 

19) Relationship of the type-token ratio in oral and written 
language.—There is close similarity among the original story, 
oral samples and written samples. However, this measure is 
invalid because of the high rate of repetition of some words in 
the English language. A pupil who writes one sentence will 
obtain a higher type-token ratio than an author who writes a 
book. It could be a valid measure if an identical number of 
words were allowed for each sample. 

20) Relationship of the Lorge grade rating in oral and written 
language.—The low degree of relationship as indicated in Table 
II is to be expected because of the dependence of the Lorge rating 
on sentencing. 


SUMMARY OF ANALYSIS OF LANGUAGE SAMPLES 


It would seem, from material presented in the preceding section, 
that there is substantial degree of relationship among reading 
material, oral language samples and written language samples of 
fifth- and sixth-graders in the following: verbal memories 
evoked, number of words used, number of different words 
employed, number of hard words, number of phrases and degree of 
subordination. Further, in situations similar to this study, such 
factors may be reliably measured. If such measures are used, 
different expectations or norms must be established for all factors 
except degree of subordination. 

Factors considered in the preceding section which indicate some 
degree of relationship among the three media are: number of 
minutes for reading and reproduction, number of sentences, and 
type-token ratio. These have not proved suitable as measures 
of all three types of language behavior. While the number of 
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TaBLE IV.—MeEans, STANDARD DEVIATIONS AND CRITICAL 
Ratios oF DIFFERENCES BETWEEN THE MEANS OF ORAL 
LANGUAGE SAMPLES AND WRITTEN LANGUAGE SAMPLES 


FOR SEVENTY PupPILs IN EAcu or Groups ‘A’ AND ‘B’ 


























Group A Group B | Criti- 
cal 
Mean| SD | Mean| SD | Ratio 
Oral reproductions 
Total number of words /|272.84/120.91/309.58|124.64| 1.77 
Number of different 
words 111.38] 38.17|119.40) 39.73) 1.22 
Number of hard words 17.41} 8.28} 19.10) 7.57) 1.26 
Number of phrases 18.78} 10.02} 20.76) 10.01) 1.16 
Number of sentences 13.64) 6.38) 16.78) 7.06) 2.76 
Number of run-on 
sentences 2.00; 1.80) 1.91; 1.83 . 28 
Number of incomplete 
sentences 00 .67 . 26 55 . 26 
Degree of subordination 31 .10 .32 .09 91 
Number of verbal 
memories 34.18) 15.59) 35.84) 15.26 64 
Average sentence length 
in words 20.79) 5.85) 19.20) 5.74 63 
Type-token ratio 44 .09 .42 .06} 1.95 
Lorge rating 4.58 .30| 4.42 .38 .14 
Written reproductions 
Total number of words /|188.63) 72.72|204.33) 88.49) 1.15 
Number of different 
words 91.30) 27.48) 97.01} 33.98) 1.09 
Number of hard words 12.97; 5.46} 13.80) 7.01 .78 
Number of phrases 15.17| 7.32) 15.77) 8.29 45 
Number of sentences 10.73) 6.36) 11.97) 5.58) 1.22 
Number of run-on 
sentences 1.14, 1.21} 1.51) 1.50) 1.61 
Number of incomplete 
sentences .30 . 66 .58 .94, 1.88 
Degree of subordination .33 ll 31 .09} 1.23 
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TaBLE IV.—(Continued) 
Group A Group B | Criti- 
cal 
Mean| SD | Mean| SD | Ratio 
Number of verbal 

memories 30.01) 12.66) 30.76) 14.45 .32 

Average sentence length 
in words 26.00) 30.31) 19.71) 10.82} 1.64 
Type-token ratio 51 .07 .50 .08 34 
Lorge rating 5.04) 1.84) 4.58 .69} 1.98 














minutes for oral and written reproduction are positively related, 
there is little relationship of reading time with oral and written 
reproduction time. Differences of opinion as to what constitutes 
a sentence invalidates this as a measure. The type-token ratio 
is not a valid measure unless all samples are equated for number 
of words. 

Factors which indicate little relationship among the media or 
are not applicable to more than two of the media are: average 
sentence length, the Lorge rating, mispronunciations, repetitions 
and excessive phrasing. Average sentence length and Lorge 
rating are suitable measures, but both are dependent on the 
definition of a sentence. 


COMPARISON OF THE TWO EQUATED GROUPS 


Statistical evidence of the result of comparison of oral and 
written language performance on the basis of order or presenta- 
tion is given in Table IV. Group A first reproduced the story 
orally, then in writing. This order was reversed with Group B. 

The only significant difference is in the number of sentences in 
oral samples. The writer is of the opinion that pupils who have 
just read the story rush through oral reproduction in the hope that 
facts will not be forgotten. Those who write the story prior to 
oral reproduction have undergone a sufficient time-lapse to 
assure that facts still remembered will remain so for a period of 
time. 

Although no significant differences exist, with the exception of 
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oral sentencing, the general trend as shown in Table IV is 
interesting. On those measures which were accepted as valid in 
the preceding section, Group B is superior in all but degree of 
subordination. Thus, according to evidence from the present 
study, pupils who write the story before telling it use, in both 
oral and written samples, a greater number of words, a greater 
number of different words, a greater number of hard words, a 
greater number of phrases and a greater number of verbal 
memories. 


CONCLUSIONS 


1) The level of development in the three language arts appears 
to be in the order: material read, oral language and written 
language. However, this result in a reproductive situation does 
not apply with equal validity to any other combination of 
language activities because the assumption is that the material 
read is the standard. 

2) Varying the order of oral and written reproductions does 
not affect the quality of the samples, except for the number of 
oral sentences, to a significant degree. However, there is a 
general trend toward superior language usage in both oral and 
written samples when written reproduction is performed first. 

3) The level of development of oral and written language is 
more dependent upon the level of the material read than upon the 
reading level of the pupils according to evidence of this study. 
The average level of maturity for oral language samples is identi- 
cal with the difficulty of the passage read, both with Lorge rating 
of 4.5. Further experiments using reading material on various 
levels of difficulty would be necessary before this statement can be 
made with any degree of certainty. 

4) Level of development revealed by one measure is not com- 
parable with that revealed by another. Studies comparable 
to the present one have not been sufficient to establish levels of 
comparison among the measures. For example, it is not possible 
to state that one sample containing fifty more words but five fewer 
phrases than another sample is of less, equal or greater degree 
of language maturity. 

5) Measures which appear suitable as multiple-measures. . . . 
which could be applied to material read, spoken and written in 
situations and with material comparable to the present study 
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are: verbal memories, number of words, number of different 
words, number of hard words, number of phrases and degree of 
subordination. The Lorge formula may prove to be an accepta- 
ble multiple-measure although such was not clearly the case in 
the present study. All of these measures except degree of 
subordination would require separate norms for the three media. 

6) Assuming that the measures are suitable multiple-measures, 
the level of maturity of development in each factor could be 
determined ‘only by further investigations. Comparable studies 
are not sufficient to establish levels of development. However, 
the present study indicates that, in similar situations, the mean 
number of written-language words may be approximately two- 
thirds that of oral reproduction; number of different words in 
oral samples may be about one-third the number of words in 
reading material; and the number of hard words one-tenth the 
total number of words, with written samples comparable on a 
reduced scale. There is further indication that in similar situa- 
tions pupils may be expected to produce approximately one- 
third of the facts which they have read orally and that the ratio 
of subordinate clauses to total number of clauses used may be 
approximately one-to-three. 

7) The hypothesis of Hahn(9), that excessive phrasing in oral 
reproduction is a habit caused by nervousness or excitement and 
tending to persist, appears to be substantiated here to the extent 
that this factor is related to reading and oral reproduction. The 
same relationship seems to be true for repetitions in oral reading 
and oral reproduction. 

8) From evidence of the present study, the best single index of 
measurement in reading material, oral and written language 
samples would appear to be the degree of subordination. 
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IMPROVEMENTS IN READING RATE AND 
COMPREHENSION OF SUBJECTS 
TRAINING WITH THE 
TACHISTOSCOPE 


HENRY P. SMITH and THEODORE R. TATE 
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Considerable interest has been aroused within the past five 
years by reports of research and statements of theory concerning 
gains in both reading speed and comprehension, and in some 
cases in other perceptual processes, following tachistoscopic train- 
ing with digits or other materials. 

Projectors equipped with tachistoscopes and instruments de- 
signed to present printed material at controlled rates of speed 
have been used alone or in combination with each other in 
numerous programs attempting to improve the reading perform- 
ance of adult subjects. A number of pieces of equipment of both 
types are being offered for sale with the implication, at least, that 
a means for quick improvement in reading has been discovered. 
Data concerning the results obtainable have been moderate in 
amount. 

The present article is a report of one attempt to gain informa- 
tion concerning the amount of improvement in adult reading 
ability which might accompany the use of such equipment. 
Data are presented concerning possible differences in personality 
and intelligence test scores between subjects who made large 
and small gains in ability to read material presented by means of 
the rate-controller; and, test scores are reported for those sub- 
jects who persisted in the training program well beyond their 
original objective of thirty-five sessions and those who dropped 
from the experiment before reaching their goal. In addition, the 
relationship between training and changes in test score on the 
Minnesota Clerical Test will be examined. This latter test was 
used in an attempt to examine the hypothesis suggested by 
Renshaw and others that tachistoscopic training may lead to a 
general improvement in form perception and that this improve- 
ment may transfer to reading and other special cases of form 
perception. 
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METHOD AND MATERIALS 


Two reading-rate controllers manufactured by the Three 
Dimensional Company of Chicago and two standard 2” X 2” 
slide projectors equipped with Wollensak No. 3 Alphax shutter 
permitting exposures from one to 499 second were available for 
the experiment. Each projector was equipped with a Selectron 
(automatic slide changer) and a number of trays for use with the 
Selectron. « This equipment made possible an efficient plan for 
progressive work, eliminated the mixing of slides, and allowed 
the subject to give continuous attention to the training material. 
Each tray contained thirty (five to nine digit) number slides, 
and different trays were used for each day’s work. From thirty 
to sixty slides were used for each training session. After the 
first three sessions a shutter speed of 4%o9 second was used 
throughout the experiment. The number of digits on the slides 
used by each subject was increased as soon as he became able to 
achieve accuracy on about eighty per cent of the slides he was 
using, 

Eighteen college students volunteered to take extensive 
preliminary and end-tests and to participate in a minimum of 
thirty-five training periods of fifty minutes each. One-half of 
each training period was spent flashing digits onto a small screen 
with the tachistoscopic projector, reporting the digit observed, 
and checking the correctness of the response. The other half 
of the period was devoted to the reading of material on the con- 
troller. The speed of the controller was adjusted by the subject 
who was instructed to increase the speed as he felt he could do so 
and still continue to read the selection with understanding. 
Popular novels chosen by the subject were used as practice mate- 
rial. The difficulty of the novels varied from fifth to eighth 
grade. The subjects met daily, five days per week, with one of 
the authors present in the laboratory. 

In addition to the record of the speed at which practice mate- 
rial was read on the rate-controller, measures of reading improve- 
ment consisted of weekly tests selected from Penguin Island, by 
Anatole France (tenth- to twelfth-grade difficulty, Dale-Chall 
formula). The Smith-Moler Test of Reading Effectiveness 
(Advanced Form) B, C, or D was given as a pre-test, on the 
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thirty-fifty session and/or at the end of training.! All tests were 
administered as in regular test situations and were not presented 
by means of the controller. 

The procedure for administering the weekly test was to allow 
the subject to read one or two chapters as a warm-up and then to 
read, for test purposes, a two to four thousand word continuation 
of the material under timed conditions. A short test was given 
over the selection after it was read. The Smith-Moler test was 
used to measure reading rate and comprehension on selections of 
three levels of difficulty. 

Other data available concerning the subjects included pre- 
training scores on the Wechsler-Bellevue Intelligence Scale, 
Form II, and the Minnesota Multiphasic Personality Inventory. 
In addition, the Minnesota Clerical Test was given before train- 
ing, after the thirty-fifth session, and/or at the end of training. 


RESULTS 


Eighteen subjects completed twenty-five or more training ses- 
sions, thirteen completed thirty-five or more sessions, six com- 
pleted fifty or more, and two completed seventy sessions. 

Table I gives the average speed at which the subjects were 
operating the reading rate controller on the first and fourth 
meetings and on every eighth meeting thereafter. While a 
record of the controller speed was made at each meeting, the 
points reported are sufficient to indicate the increased speed 
observed in use of the reading rate controller. 

It will be noted that speed of reading on the reading rate 
controller rose steadily and does not appear to have reached a 
maximum by the sixty-eighth session. 

The results on the weekly tests are shown in Table II. It 
appears that in the case of those subjects who continued training 
for thirty-five periods or more, substantial gains were made with- 





1The Smith-Moler Test was developed for another study in reading 
improvement and is not yet in published form. Each of its four forms con- 
tains three reading selections of approximately 2400 words, 1400 words, and 
1200 words of sixth-grade, college freshmen, and graduate level difficulty, 
respectively. The subject is allowed a specified time for reading each selec- 
tion. When time is called he records the amount read. Per cent of compre- 
hension is obtained by dividing the number of correct answers by the number 
of questions which could have been answered with the information obtain- 
able in the portion of the selection which was read. 
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TaBLE I.—MeEAN READING RATE CONTROLLER SETTING FOR 
THE FIRST AND FOURTH SESSION AND EVERY EIGHTH SESSION 












































THEREAFTER 
‘ Num- Training Sessions 
Number of ass ial 
Training Sub- 
Periods : 1 | 4 {12/20} 28 | 36} 44] 521] 60] 68 
jects 
25 or more} 18 (|474/587|708/927 
35 or more} 13. /|459)/550)707/894/1127 
50 or more 6 |457/556)749/921\1169) 1481/1246 
70 2 |424'380\453/507 


1081)1543 1290 Loag 1675/2120 





out significant comprehension losses. 
tinued training until the end of the twelfth week (sixty periods) 
had on that test an average speed score of 917 words per minute 
with seventy per cent comprehension as compared to their mark 
at the end of one week of 291 words per minute and 67.5% 
comprehension. 


Two subjects who con- 


TABLE II.—READING SPEED AND COMPREHENSION SCORES AT THE 
END OF THE FIRST AND SECOND WEEK AND Every Two 
WEEKS THEREAFTER 











Number of | Number | Meas- Weeks 
Training of ure 
Periods Subjects | Used} 1 2; 41] 6 |81]10)12 
25 or more 18 speed |364 |335 |433 
comp |80 |70 |72.5 
35 or more 13 speed |343 |317 |405 |580 
comp |80.7/70 (|75.7/71.9 
50 or more 6 speed |366 |364 |497 |615 |688/630 
comp 81 (|68.3/77.5)75 |74 |69 
70 2 speed |291 |307 |476 |728 |619)854:917 
comp |67.5|}60 |75 |60 |55 |55 |70 
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A summary of the scores made on the Smith-Moler Test by the 
thirteen subjects who continued training for thirty-five or more 
periods is presented in Table III. It will be seen that an improve- 
ment of approximately fifty per cent in rate of reading on all 
three sections of the test was made by the thirty-fifth session 
but that in all cases this was accompanied by some drop in com- 
prehension score. 


TABLE III].—ScoreEs ON THE SmitH-MoLerR Test oF READING 
EFFECTIVENESS BEFORE AND AFTER THIRTY-FIVE SESSIONS 
oF TRAINING WITH THE TACHISTOSCOPE AND READING 
Rate CONTROLLER 

















(N = 13) 
Pre-Test 35th Session 
Level of Difficulty : 

Rate Compre- Rate Compre- 
hension hension 

Seventh Grade 302 91 447 76.6 

College Freshman 179 83 312 73.9 

College Graduate 178 75 239 69.2 











Six subjects continued for fifty or more periods of training and 
made a somewhat greater improvement in speed (one hundred 
per cent on the College Freshman level material). They too 
suffered some drop in comprehension score except that a slight 
although probably not statistically significant gain was made in 
comprehension from the thirty-fifth to the fiftieth session on two 
sections of the test. 

Eye movements of the subjects were photographed by means 
of the Ophthalmograph at the beginning of training, at the 
thirty-fifth training session, and/or at the end of training. The 
results of an analysis of the photographs are presented in Table IV. 

The number of fixations necessary for each hundred words 
appears to show a regular and, in most cases, a substantial drop 
and the number of regressions is cut nearly in half. 

An examination was made of the scores on the Minnesota 
Multiphasic Personality Inventory in an effort to determine dif- 
ferences in test patterns between subjects who continued training 
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TABLE IV.—OPpTHALMOGRAPHIC RECORD OF SUBJECTS BEFORE 
AND AFTER TRAINING! 


Sessions N Rate Comp? Span? Fixt Regress® 
25 18 Pre 425 # 73.3 3.02 34. 4.4 
(or more) Post 518 = 81 3.96 27.8 2.5 
35 13 Pre 400 69.6 3.11 33.2 4.46 
(or more) Post 540 88.4 4.12 26.8 2.15 
50 ‘+ 6 Pre 370 65 3.16 32.5 3.3 
(or more) 35th 559 =: 83.3 4. 26.8 1.8 
Post 545 75 3.81 29.1 2.1 
70 2 Pre 260 65 2.89 35. 6.5 
35th 633 8987.5 4.06 24. 2.5 
70th 600 =80 4.35 23.5 2. 


! Reading standard paragraphs supplied by American Optical Company. 
2 Comprehension in per cent. 

’ Span of recognition in words. 

‘ Fixations per one hundred words. 

5 Regressions per one hundred words. 


for fifty periods or more and subjects who stopped short of their 
previously agreed upon goal of thirty-five sessions. The number 
of cases involved does not justify more than a few tentative 
observations. There were four males who stopped training short 
of the goal and two males and two females who continued training 
well past the goal. The four members of the group stopping short 
of the goal tended to exhibit defensive reactions (higher K score— 
66 vs. 59.5) and their scores indicate that they appear to feel that 
they are discriminated against? (higher Pa score—61 vs. 51.5). 
The male subjects remaining beyond their original goal showed 
greater anxiety (higher D score—76 vs. 57) although the average 
D score for the two female subjects was 50 and the males in this 
group showed more deviate patterns in a neurotic direction and 
more femininity of interest (MF score 77 vs. 66) than did the 
members of the group who failed to reach the goal. About the 
only characteristic of the female profiles which distinguished them 
from normal was a tendency to over-anticipate. 





? The writers are indebted to Dr. William Cottle, assistant director of the 
University of Kansas Guidance Bureau, for such interpretations of scores 
from the Minnesota Multiphasic Personality Inventory and Minnesota 
Clerical Test as are found in this article. 
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Comparisons were made also between the Multiphasic scores 
of the four subjects who made the greatest improvement in read- 
ing rate controller speed from the first to the thirty-fifth session 
(all males) and the three males and one female who made the 
leastimprovement. The group making small gain showed greater 
anxiety, (higher D score—62 vs. 55) psychasthenic behavior, 
(Pt score—64 vs 52) and tendencies toward paranoia (Pa 56 vs. 
51). These tendencies may have operated as inhibiting factors 
to prevent increments in reading rates. This group appeared to 
have more emotional distractions than did the group making the 
largest gains. The average MF score for the three males making 
largest gains was 78 and for the four making smallest gains it 
was 71. It may be worth noting that ten male subjects who 
continued training for thirty-five periods or more made an average 
MF interest score of 71.6. 

The Minnesota Clerical Test scores of the four subjects who 
attended fifty to seventy training sessions showed a rise from 
55.1 on the numbers and 68.7 on the names portion to 76.5 on the 
numbers and 80.7 on the names. The four subjects who trained 
for approximately twenty-five sessions made an initial score of 
63.5 on the numbers portion of the test and 79 on the names por- 
tion and a final score of 77 on the numbers and 63.3 on the names. 
This loss by the ‘under’ group on the names portion may indicate 
an indifferent attitude in taking the final test. Otherwise it is 
difficult to reconcile the general steady improvement of the ‘over’ 
group and the drop by the ‘under’ group. 

The Wechsler-Bellevue scores for the group making the largest 
gain in reading rate controller setting was 121.1 on the verbal, 
126 on performance, and 126.2 on the full test. The average 
score of the four subjects making the smallest gain was 127 on the 
verbal, 117 on the performance, and 124.7 on the full scale. 
Thus, the verbal score of the group making the smallest gain was 
already markedly higher than was their performance score, while 
the reverse was true in the case of those making the largest gain. 
This would appear to indicate that the latter group had more 
potential for improvement in verbal skills. Had the Wechsler- 
Bellevue scores been obtained again at the end of the study it 
would be interesting to see if the latter group did now show a 
substantial gain in verbal score with a resultant increase in the 
general intelligence quotient. 
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The primary purpose of the Minnesota Clerical Test was to 
investigate the types of perceptual improvement other than 
reading which might accompany training with the tachistoscope. 
The results of this test as administered at various points in the 
training program are reported in Table V. 


TABLE V.—PERCENTILE SCORES ON THE MINNESOTA CLERICAL 
TEST BEFORE TRAINING, AT THE END oF THIRTY-FIVE 
SESSIONS, AND/OR AT THE END oF TRAINING 

Number of Number 


Training of 
Periods Subjects Pre-Test 35th Session End-Test 
25 or more 18 Numbers 60.7 73.9 
Names 68.2 75.2 
35 or more 13 Numbers 58.5 71.3 
Names 69.0 75.0 
50 or more 6 Numbers 57.16 69.91 77.58 
Names 72.41 80.58 82.83 
70 "2 Numbers 50.25 65.5 75.75 
Names 57.25 67.5 Fis 


Table V appears to indicate that increments in Minnesota 
Clerical Test scores did occur with continued training up to the 
end of the present experiment. Increases for the numbers por- 
tion of the test were considerably greater than for the names 
portion. The reason for this may lie in the rather direct use of 
numbers in the tachistoscopic portion of the training procedure 
and some failure of the gain in perceptual skill to transfer com- 
pletely to a different type of task. 


GENERAL COMMENTS 


As shown by their settings of the reading rate controller and by 
their frequently expressed opinions, the subjects appeared to 
believe they were obtaining tremendous improvements in read- 
ing speed as a result of either the tachistoscopic training or the 
practice on the reading rate controller or from a combination of 
the two. 

While the various tests employed indicated substantial 
improvement in reading rate, the improvement as measured by 
reading tests was not nearly as great as was shown on the control- 
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lers. While this may not be taken as conclusive evidence that 
reading speed and comprehension were not built to a high level 
as a result of the training nor that such improvement cannot be 
transferred to reading independently of the machine, it is clear 
that but a small portion of the indicated speed did transfer 
in the present experiment and this at some loss in the area of 
comprehension. 

However, the indications from the comparatively low but 
actually rather high improvements obtained in test performance 
and in eye-movement records indicate that further experimenta- 
tion may be of value. On the other hand, the results of this and 
other experiments indicate that as yet too little is known con- 
cerning the effect of such equipment on the reading ability of 
persons of various ages, degrees of intelligence, and varying types 
of personality patterns to warrant general use of the equipment 
in remedial reading programs. 
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BOOK REVIEWS 


Mitton Gurvirz. The Dynamics of Psychological Testing. 
New York: Grune and Stratton, 1951. 


Although written primarily for clinical students, this book was 
inspired by a non-clinician’s comment, ‘‘I’d really like to see 
some time how you clinicians use these tests.’”” Gurvitz has 
provided a thorough demonstration of one style of clinical 
interpretation. He presents seventeen cases, selected from the 
patients tested during one month in a mental hospital. Typically, 
the case report includes a scored Rorschach protocol, Wechsler 
protocol, figure drawing with inquiry, a discussion of the dynam- 
ics of the case, the original psychological report with interpolated 
agreements or reservations by the therapist who later handled the 
case, a formal case history, and a final summary. 

It. is a mistake for Dr. Joseph Miller, who writes the foreword, 
to refer to this as research. It is essentially a report of clinical 
experience and opinion. There is no guarantee that Gurvitz did 
not select cases where his diagnostic procedures worked well, and 
so this provides no solid evidence of the validity of the specific 
procedures. The reports do show that tests can contribute 
greatly to the understanding of patients. 

Gurvitz is an able interpreter of behavior. He studies his 
patient as a person, making as much note of his over-all behavior 
as of the test responses. He has a wise viewpoint regarding his 
tests, making excellent critical comments on some of the pro- 
posals of Rapaport, Wechsler, and the Rorschach sign systems. 
The book as a whole shows high level psychological skill in action. 
Gurvitz makes especially good use of the Wechsler and figure- 
drawing data. 

Too often, Gurvitz writes statements which, read by them- 
selves, would encourage mechanical and unintelligent use of tests. 
Once in a while he seems to say that a diagnosis is decided by some 
single response of the patient. He seems, despite his disclaimers, 
to use signs himself in diagnosis. For instance, on the subject 
of W:M in Rorschach, he says (p. 21): “‘If M predominates, then 
we have a surfeit of ability and creativeness but insufficient drive 
to project it out into the world.’”’ Gurvitz relies heavily on a 
Freudian terminology and way of thinking which does not seem 
to be an integral part of his diagnostic skill. His dogmatic and 
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atomistic statements about particular indicators will blind some 
readers to the fact that his actual diagnosis is based on a thorough 
integration which allows each fact about the patient to add to the 
significance of each other. 

The Rorschach interpretation suffers greatly from an incautious 
attitude. Gurvitz seems to accept almost every idea of Klopfer 
without reéxamination. The literature contains enough valida- 
tion research by this time to suggest that many items in this 
interpretive system are questionable. The statement quoted 
about W: M ignores questions of unreliability. It does not admit 
that the relation between personality and action is so complex 
that not even the most valid statement will be true of all cases. 
It does not acknowledge that the current literature contains 
almost as many conflicting theories of the meaning of M as there 
are writers. It does not warn the reader that a test indicator 
which goes with some defect in a hospital population will also 
appear among any normal group tested, with no corresponding 
defect. Gurvitz probably makes a more flexible use of the Ror- 
schach, and a more cautious use, than his generalizations will sug- 
gest to the reader. 

It is perhaps too much to hope that writers like Gurvitz will 
follow each suggested interpretation with a statement of the 
percentage of cases for which the interpretation is valid, out of 
all those where the indicator appears. Until they do make state- 
ments in those terms, they can expect that non-clinicians will con- 
tinue to regard clinical psychology as dogmatic and unsound. 

This book can be used profitably with advanced students in 
clinical diagnosis by two types of instructor. The ones who wish 
students to be skeptical of over-enthusiastic and fine-drawn 
interpretations will find here examples worthy of critical atten- 
tion. ‘These reports are neither illogical nor lacking in insight. 
The reasoning is careful and the psychology insightful; criticism 
must therefore focus on the premises underlying the inter- 
pretation. The instructors who want to teach students to 
squeeze test protocols for all they are worth, without too much 
fretting about lack of validation, should also use this book. If 
students are to learn this brand of clinical psychology, they 
should be taught from a model as skilled as Gurvitz. 

While Gurvitz’ writing does not do justice to his acumen, his 
thinking about tests is in many ways sounder than that of some 
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other clinicians and of those who look to a test for a score and 
nothing but a score. His concept is worth quoting: 

““, . . Our present psychologic technics give us the maximum 
amount of information in a minimum of time and with a lessened 
degree of subjectivity. The tests which are used clinically .. . 
are considered to offer a wide range of possible environmental 
stimuli. Conflicts are aroused in miniature to see how they are 
handled or mishandled, anxiety is provoked, basic relationships to 
figures in the environment are encountered, phantasy is forced 
upon the patient—all to duplicate the world in miniature. The 
process of interpreting psychologic procedures is based upon the 
assumption that the person handles the microcosm of the testing 
situation in the same manner as he handles life.”” (p. 7) 

Lee J. CRONBACH 

University of Illinois 


GeorGE G. THompson. Child Psychology: Growth Trends in 
Psychological Adjustment. Boston: Houghton Mifflin Co., 
1952, pp. 667. $5.50. 


Child psychology has been an important special interest in the 
total domain of psychology since the earliest years of this century. 
In the post-war years the impression is easily gotten that this 
interest has decreased relatively, if not absolutely. Attention is 
being directed so much to industrial, social, and clinical problems 
of adults that the significance of the child is perhaps too much 
neglected. Yet, that there has been sound and important 
research with children, and that the developmental processes are 
of significance in understanding adult behavior, cannot be denied. 
The evidence for this statement is to be found in Thompson’s 
book. : 

As an interpretive exposition of the status of scientific child 
psychology at mid-century, this volume will find an important 
place in the psychological literature. Systematically, the author 
moves from an introduction of child psychology as a scientific 
discipline, through a review of the behavior patterns of the new- 
born, to more extensive considerations of the processes of psycho- 
logical growth and adjustment, the interactions of motivation and 
learning, personal and social adjustment, and finally a survey of 
the theories of personality organization. The organization is 
determined by psychological processes rather than chronological 
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age groupings. Changes with age are pointed out, but the 
process of growth, rather than its stages, is constantly foremost in 
the argument. Research results and theoretical positions are 
expertly related to each other and woven into a coherent pattern. 
The writing style is clear and readable. As a textbook for 
advanced undergraduates this volume is excellent; there might be 
some doubts of the author’s claim that the contents “‘should be 
understandable to the average college freshman” even with 
‘‘occasional assistance from the instructor.’”’ Outside of the 
college classrcom there are many professional workers with chil- 
dren, and laymen—parents or others—who will find the reading 
of this book of immediate interest, and who can turn to it as a 
source of information for some time to come. The chapter 
bibliographies, totalling nearly one thousand references, alone 
are an important guide to the literature of the important and 
fascinating field of child psychology. C. M. Lourtir 
University of Illinois 


WiuturamM C. Morse, Francis A. BALLANTINE, AND W. RoBertT 
Drxon. Studies in the Psychology of Reading. Ann Arbor: 
University of Michigan Press, 1951, pp. 188. 


Three research studies on eye movements in reading are 
reported in this volume. Morse’s study deals with changes in eye 
movements when fifth- and seventh-grade pupils read materials 
which are easy, at grade and difficult. The descriptive science 
materials used were equated for difficulty by readability formulas. 
The eye-movement records revealed that seventh-graders were 
more efficient readers than fifth-graders, and that changes in 
difficulty of materials produced no significant changes in reading. 
Thus, eye-movement patterns tend to remain stable in various 
situations. 

Ballantine measured oculomotor changes in silent reading at 
various grade levels from the second through the twelfth. The 
records revealed rapid improvement through Grade IV, little 
change from IV to VI, further gains through VIII and little 
growth thereafter. Again the eye-movement patterns did not 
change with change in difficulty of textural material. 

Dixon’s subjects, specialists in physics, history and education, 
read material in their own field and in the other fields. The sub- 
jets read material in their own field most efficiently. It is con- 
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sidered that familiarity of material is a factor in the reading per- 
formance but that different types of material do not produce 
different eye-movement patterns. 

In all these studies the experimental designs are excellent, and 
the results represent worthy contributions. The reviewer is 
disturbed by the fact that in all three studies a change in difficulty 
or content of reading material brought no changes in oculomotor 
patterns. It is generally accepted that the most effective reader 
is the one who modifies his pace to fit the requirements of the 
content and the difficulty of the textual materials. Lack of such 
flexibility in readers seems unfortunate. Further research on this 
topic is desirable. Mies A. TINKER 

University of Minnesota 


Dorotuy C. ADKINS AND SAMUEL B. Lyrerty. Factor Analysis 
of Reasoning Tests. Chapel Hill: University of North 
Carolina Press, 1952, pp. iv + 122. 


The authors factor-analyzed first a battery of thirty-eight 
selected Air Force tests, in order to obtain leads for selecting tests 
to go into their main experimental battery. The latter contained 
sixty-six tests, which were administered to a fairly representative 
sample of two hundred soldiers. Product-moment intercor- 
relations were computed, and sixteen factors were extracted by the 
complete centroid method and rotated as nearly as possible to an 
oblique simple structure. 

The following reference factors were put in (two tests for each) 
and taken out again: verbal relations, perceptual speed, number, 
word fluency, space 1 (visualization of rigid figures under rotation 
and translation), speed of perceptual closure, ideational fluency, 
and space 2 (visualization of figures whose parts move fm relation 
to one another). 

Five reasoning factors were identified and described: perception 
of abstract similarities, hypothesis verification, flexibility of 
perceptual closure, deduction, and concept formation. Percep- 
tion of abstract similarities is defined by high loadings on verbal 
classification tests and on analogies tests, both verbal and non- 
verbal. Hypothesis verification is defined essentially by high 
loadings on the progressive matrices test. Flexibility of per- 
ceptual closure is defined by high loadings on figure classification 
tests of the types which require allocation of a fairly large number 
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of test items to one of two or three groups after the groups have 
been distinguished, and by high loadings on Gottschaldt Figure 
tests and others more or less similar. Deduction is defined mainly 
by high loadings on syllogisms tests and verbal analogies tests. 
Concept formation is defined most clearly by tests which require 
the examinee to supply a general name for a group of words or 
objects. 

In the original analysis, sixteen factors were extracted, even 
though the common-factor variance was substantially exhausted 
after the extraction of the fourteenth. After rotation there 
were three factors which were uninterpretable and did not conform 
well to the criterion of simple structure. All of these factors 
contained reasoning tests of one sort or another with loadings 
above .30; five tests had their highest loadings on one or other 
of these ‘residual’ factors. 

This study shows a remarkable number of points of disagree- 
ment with previous factor analyses. Since it is the first major 
study aimed specifically at determining the nature of the factors 
in the reasoning domain, some disagreement with the results 
of previous work is to be expected. However, the points of dif- 
ference are so numerous and serious that to the present reviewer 
the substantive findings (the factors) must still be considered 
tentative, and their interpretations hypothetical. 

It is possible that a re-analysis of these data might result in a 
large improvement in the interpretability of the results. The 
authors indicate that their subjects also took all tests of the 
Army Classification battery, and that the intercorrelations 
among the ten tests of that battery, as well as their cross-cor- 
relations with the sixty-six variables of the present study, were 
computed. The use of some at least of these additional test 
scores would undoubtedly improve the definition of some of the 
reasoning factors, and especially of some of the reference factors. 
It is also probable that in the case of these data a recomputation 
by principal axes or maximum likelihood would yield a sharp 
cut-off of common factor variance after the fourteenth or perhaps 
some earlier factor. A new rotation might then provide a much 
clearer picture. It is very much to be hoped that such a re- 
analysis will be made. The reviewer is impressed by the effort 
and ingenuity that have gone into this study, but disappointed 
by the inconclusiveness of the results. His own evaluation sug- 
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gests that this inconclusiveness may possibly not be intrinsic 
to the data. EpwaArp E. CURETON 


University of ‘Tennessee 


Ruta E. Harritey, Lawrence K. FRANK AND Rosert M. 
Go.tpEenson. Understanding Children’s Play. New York: 
Columbia University Press, 1952, pp. 372. $3.50. 


This book is the result of an exploratory study undertaken in 
1947 and sponsored by the Caroline Zachary Institute under a 
grant from the National Institute of Mental Health. The 
purpose of the study was to determine the effect of play experience 
upon personality development of preschool and kindergarten 
children. This study of some one hundred eighty children from 
two to six years of age was organized and supervised by Frank; 
Hartley directed and conducted the project with the assistance of 
Mrs. Ellen Schindel. The New York State Mental Health 
Authority provided additional funds which enabled the directors 
of the exploratory study to discuss their findings and evaluate 
them with groups of teachers, nursery school directors and child 
center directors. Goldenson has taken Hartley’s original manu- 
script which represented the most pertinent findings of the 
project and has condensed and revised it so that it could be of 
most use for directors, teachers, parents, and those concerned 
with the growing child and the promotion of his mental health. 

The nine chapters in the book cover the following topics: 
dramatic play as a mirror of the child and as an instrument of his 
growth, importance of block play as an outlet for childhood 
expression, benefits of water-play, clay not only as a projective 
tool but as a raw material for construction, use of graphic materi- 
als as media for the child’s expression of feelings, finger painting 
not only as a diagnostic device but as a means of creative expres- 
sion, and lastly the combination of music and movement as ° 
therapeutic device with children. Each chapter discusses 
pertinent previous studies, if any, gives a wealth of recorded 
observations of children using a particular play medium, includes 
interpretations of the anecdotes and concludes with helpful 
suggestions to teachers and those who come in contact with the 
preschool and kindergarten ‘uild. The appendix lists extensive 
suggestions for making observations and recording them; these 
are detailed enough to be used by the layman. Notes and 
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bibliographic references are given by chapters separately. <A 
fairly adequate index is included. 

The reviewer’s greatest criticism concerns the fact that the 
observations of the play activities of the children included in the 
study were not statistically summarized so that the reader could 
get some idea of the similarity and differences in behavior of the 
well-adjusted and the poorly-adjusted child to the same play 
media. Occasionally the similarity and difference of behavior 
between the inhibited and aggressive child in one media, say 
finger painting, is pointed out but the only proof offered for such 
similarity is another anecdote concerning another child. 
Throughout the book terms are used loosely with no attempt to 
define meaning. Unfortunately the book is marred by asser- 
tions made by the authors which occur on nearly every page, 
which are not verified, and which are then interpreted as being 
true for most children. 

The diversity of authorship has led to discontinuity in thought. 
Repetitious phrases occur throughout the book and detract from 
what might have been an interesting and informative account of 
children’s play experience in relation to personality growth. 

The reviewer doubts that teachers, parents, and social workers 
will derive as much benefit from this book as is claimed. Teach- 
ers and social workers with psychological backgrounds may 
understand the implications of the findings, but parents will find 
it difficult to separate interpretation from fact, let alone make use 
of interpretation based on observations of the children used in 
this study. R. EvizaspetH Brown 

University of Illinois 
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